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Feynman and Computation 


An Overview 


Introduction 


When the Feynman Lectures on Computation were finally published in September 
1996, I promised a complementary volume that would address the ‘advanced topics’ 
covered in Feynman’s course but excluded from the published version. Over the 
years that Feynman taught the course, he invited guest lectures from experts like 
Marvin Minsky, Charles Bennett, John Cocke and several others. Since these lec- 
tures covered topics at the research frontier, things have moved on considerably in 
most areas during the past decade or so. Thus, rather than attempt to transcribe 
the often incomplete records of these original lectures, it seemed more relevant, ap- 
propriate and exciting to invite the original lecturers to contribute updated versions 
of their lectures. In this way, a much more accurate impression of the intellectual 
breadth and stimulation of Feynman’s course on computation would be achieved. 
In spite of some change to this philosophy along the way, I am satisfied that this 
book completes the picture and provides a published record of Feynman’s long- 
standing and deep interest in the fundamentals of computers. The contributions 
are organised into five sections whose rationale I will briefly describe, although it 
should be emphasized that there are many delightful and intriguing cross links and 
interconnections between the different contributions. 


Feynman’s Course on Computation 


The first part is concerned with the evolution of the Feynman computation lectures 
from the viewpoint of the three colleagues who participated in their construction. 
The contributions consist of brief reminiscences together with a reprint from each 
on a topic in which they had shared a mutual interest with Feynman. In 1981/82 
and 1982/83, Feynman, John Hopfield and Carver Mead gave an interdisciplinary 
course at Caltech entitled ‘The Physics of Computation.’ The different memories 
that John Hopfield and Carver Mead have of the course make interesting reading. 
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Feynman was hospitalized with cancer during the first year and Hopfield remembers 
this year of the course as ‘a disaster,’ with himself and Mead wandering ‘over an 
immense continent of intellectual terrain without a map.’ Mead is more charitable 
in his remembrances but both agreed that the course left many students mystified. 
After a second year of the course, in which Feynman was able to play an active role, 
the three concluded that there was enough material for three courses and that each 
would go his own way. 


The next year, 1983/84, Gerry Sussman was visiting Caltech on sabbatical leave 
from MIT intending to work on astrophysics. Back at MIT, Sussman supervised 
Feynman’s son, Carl Feynman, as a student in Computer Science, and at Caltech, 
Feynman had enjoyed Abelson and Sussman’s famous ‘Yellow Wizard Book’ on ‘The 
Structure and Interpretation of Computer Programs.’ Feynman therefore invited 
Sussman to lunch at the Athenaeum, the Caltech Faculty Club, and agreed a char- 
acteristic ‘deal’ with him — Sussman would help Feynman develop his course on 
the ‘Potentialities and Limitations of Computing Machines’ in return for Feynman 
having lunch with him after the lectures. As Sussman says, ‘that was one of the 
best deals I ever made in my life.’ 


Included with these reminiscences are three reprints which indicate the breadth 
of Feynman’s interests — Hopfield on the collective computational properties of 
neural networks, Mead on an unconventional approach to electrodynamics without 
Maxwell, and Sussman and Wisdom on numerical integrations of the orbit of Pluto, 
carried out on their ‘digital orrery’. 


Reducing the Size 


Part 2 is concerned with the limitations due to size. The section begins with a 
reprint of Feynman’s famous 1959 lecture ‘There’s Plenty of Room at the Bottom’, 
subtitled ‘an invitation to enter a new field of physics’. In this astonishing lecture, 
given as an after-dinner speech at a meeting of the American Physical Society, 
Feynman talks about ‘the problem of manipulating and controlling things on a 
small scale’, by which he means the ‘staggeringly small world that is below’. He 
goes on to speculate that ‘in the year 2000, when they look back at this age, they 
will wonder why it was not until the year 1960 that anybody began seriously to 
move in this direction’. In this talk Feynman also offers two prizes of $1000 — one 
‘to the first guy who makes an operating electric motor... [which] is only 1/64 inch 
cube’, and a second ‘to the first guy who can take the information on the page of 
a book and put it on an area 1/25,000 smaller in linear scale in such a manner 
that it can be read by an electron microscope.’ He paid out on both — the first, 
less than a year later, to Bill McLellan, a Caltech alumnus, for a miniature motor 
which satisfied the specifications but which was somewhat of a disappointment to 
Feynman in that it required no new technical advances. Feynman gave an updated 
version of his talk in 1983 to the Jet Propulsion Laboratory. He predicted ‘that 
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Fig. 1. Feynman examining Bill McLellan’s micromotor. The motor was six thousandths 
of an inch in diameter and could generate one millionth of a horsepower. By the time 
McLellan brought his motor for Feynman to examine for the prize, there had been many 
other would-be inventors anxious to show Feynman their versions of a micromotor. Feyn- 
man knew at once that McLellan was different — he was the only one to have brought a 
microscope to view the motor. 
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with today’s technology we can easily... construct motors a fortieth of that size in 
each dimension, 64,000 times smaller than... McLellan’s motor, and we can make 
thousands of them at a time.’ 


It was not for another 26 years that he had to pay out on the second prize, this 
time to a Stanford graduate student named Tom Newman. The scale of Feynman’s 
challenge was equivalent to writing all twenty-four volumes of the Encyclopedia 
Brittanica on the head of a pin: Newman calculated that each individual letter 
would be only about fifty atoms wide. Using electron-beam lithography when his 
thesis advisor was out of town, he was eventually able to write the first page of 
Charles Dickens’ A Tale of Two Cities at 1/25,000 reduction in scale. Feynman’s 
paper is often credited with starting the field of nanotechnology and there are now 
regular ‘Feynman Nanotechnology Prize’ competitions. 


The second chapter in this section is contributed by Rolf Landauer, who himself 
has made major contributions to our understanding of computational and informa- 
tional limits. Here, Landauer discusses a seminal paper by his late IBM colleague 
John Swanson, which addressed the question of ‘how much memory could be ob- 
tained from a given quantity of storage material’. Swanson’s paper appeared in 
1960, around the same time as Feynman’s ‘Room at the Bottom’ paper. In Lan- 
dauer’s opinion, ‘Feynman’s paper, with its proposal of small machines making still 
smaller machines, was that of a supremely gifted visionary and amateur; Swanson’s 
that of a professional in the field.’ Landauer also deplores the impact of fashions 
in science — while acknowledging that Feynman ‘was very far from a follower of 
fashions’. Nevertheless, such was Feynman’s influence that he could very often start 
fashions, and an unfortunate side-effect of his somewhat cavalier attitude to refer- 
encing relevant prior work — that he himself had not needed to read — was that 
scientists such as Rolf Landauer and Paul Benioff did not always get the credit they 
deserved. This was an unintended side-effect of Feynman’s way of working and I 
am sure that Feynman would have approved of their thoughtful contributions to 
this volume. 


The third chapter in this section is a reprint of a paper by Carver Mead, in which 
he revisits his semiconductor scaling predictions that formed the basis of Moore’s 
Law. In 1968, Gordon Moore had asked Mead ‘whether [quantum] tunneling would 
be a major limitation on how small we could make transistors in an integrated 
circuit’. As Mead says, this question took him on a detour that lasted nearly 30 
years. Contrary to all expectations at the time, Mead found that the technology 
could be scaled down in such a way that everything got better — circuits would run 
faster and take less power! Gordon Moore and Intel have been confirming Mead’s 
prediction for the last 30 years. 


In the last part of this section, Marvin Minsky updates and reflects upon his 1982 
paper on Cellular Vacuum — together with some thoughts about Richard Feynman 
and some trenchant comments about the importance of the certainties of quantum 
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mechanics. Minsky also recalls Feynman’s suspicions of continuous functions and 
how he liked the idea that space-time might in fact be discrete: ‘How could there 
possibly be an infinite amount of information in any finite volume?’ 


Quantum Limits 


Computational limitations due to quantum mechanics is the theme of the next sec- 
tion. There is no better place to begin than with a reprint of Feynman’s famous 
paper in which he first suggested the possibility of a ‘quantum computer’. ‘Simulat- 
ing Physics with Computers’ was given as a ‘keynote speech’ at a 1981 conference 
at MIT on the ‘Physics of Computation’, organized by Ed Fredkin, Rolf Landauer 
and Tom Toffoli. At the conference, after claiming not to ‘know what a keynote 
speech is’, Feynman proceeded to give a masterful keynote presentation. In his 
talk, he credited his entire interest in the subject to Ed Fredkin and thanked him 
for ‘wonderful, intense and interminable arguments’. Feynman begins by discussing 
the question of whether a universal computer can simulate physics exactly and then 
goes on to consider whether a ‘classical’ computer can efficiently simulate quan- 
tum mechanics and its quantum probabilities. Only Feynman could discuss ‘hidden 
variables’, the Einstein-Podolsky-Rosen paradox and produce a proof of Bell’s The- 
orem, without mentioning John Bell. In fact, the paper contains no references at all 
— but it does contain the idea of simulating a quantum system using a new type of 
non-Turing, ‘quantum computer.’ It is also interesting to see Feynman confessing 
that he’s ‘not sure there’s no real problem’ with quantum mechanics. 


The next three chapters are by three of the leaders of the new research fields 
of quantum information theory and quantum computing. Paul Benioff, like Rolf 
Landauer, has probably received insufficient credit for his pioneering contributions 
to the field. In his paper on ‘Quantum Robots’, he explores some of the self- 
referential aspects of quantum mechanics and considers the quantum mechanical 
description of robots to carry out quantum experiments. 


Charles Bennett, famous for his resolution of the problem of Maxwell’s Demon 
and for his demonstration of the feasibility of reversible computation, has made 
important contributions both to the theory of quantum cryptography and quan- 
tum teleportation. In a wonderful advertisement, shown to me gleefully by Rolf 
Landauer, IBM Marketing Department went overboard on Bennett’s work on tele- 
portation. Invoking images of ‘Star Trek’, the ad proclaimed “An IBM scientist and 
his colleagues have discovered a way to make an object disintegrate in one place 
and reappear intact in another” An elderly lady pictured in the ad talking on the 
telephone to a friend says “Stand by. I'll teleport you some goulash.” Her promise 
may be ‘a little premature,’ the ad says, but ‘IBM is working on it.’ Charles Bennett 
was embarrassed by these claims and was later quoted as saying “In any organiza- 
tion there’s a certain tension between the research end and the advertising end. I 
struggled hard with them over it, but perhaps I didn’t struggle hard enough.” His 
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Mr, William H, McLellen 
Electro-Optical Systems, Inc: 
125 North Vinedo Avenue 
Pasadene, California 


Dear Mr, McLellan: 


I can't get my i 


a the fasainating 2 motor you 
showed me Saturday, How 6 


“wade so emell? 


Before you shor ele told. you I hadn't 
formally set up that prize 7 mentioned in ay Engineering and 
Science article. The reason I ‘delayed was to try to formulate 
it to avoid legal arguments foveh as shoving a ae mercury 


for me, to straighten out any tax pinatioes, ste... "Bat T. Bank 
putting it off and never did get around to it. 


But what. you. showed me was exactly whet I had had in 
mind when I wrote the article, and you are the first to show m 
anything like it. So, I would like to a you the enclosed 
prize, You certainly deserve it. 


I am only slightly disappointes : nat ne meger new 
technique needed to be developed to make ‘mote ‘I wea sure 
I had it small enough: ‘that: you coul. oit dsrectiy, but you 
did, Congratulations$ 

Now don't. start: iting sma = 


I don't intend to make. good ne other one, Since 


Bispeeely yours, 
Richard PF, Feyuoman 


RPF sn 


Fig. 2. The letter from Feynman to McLellan acompanying the $1,000 cheque. In the letter 
Feynman admits to some disappointment that he had not specified the motor small enough 
to require some new engineering techniques — but was honest enough to acknowledge that 
McLellan had made exactly the sort of motor he had had in mind when he issued the 
challenge. 
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paper in this volume discusses recent developments in quantum information theory 
including applications of ‘quantum entanglement’ — a term used by Schrodinger as 
long ago as 1935 — and possible ‘entanglement purification’ techniques. 


The last chapter in this section is by Richard Hughes, who was at Caltech with 
Feynman in 1981 when I was there on sabbatical. Hughes now leads a multidis- 
ciplinary research team at Los Alamos National Laboratory that has constructed 
working quantum cryptographic systems. His contribution surveys the fundamen- 
tals of quantum algorithms and the prospects for realising quantum computing 
systems using ion trap technology. 


Parallel Computation 


Feynman’s first involvement with parallel computing probably dates back to his 
time at Los Alamos during the Manhattan Project. There was a problem with the 
‘IBM group’, who were performing calculations of the energy released for different 
designs of the plutonium implosion bomb. At this date in 1944, the IBM machines 
used by the IBM group were not computers but multipliers, adders, sorters and 
collators. The problem was that the group had only managed to complete three 
calculations in nine months prior to Feynman taking charge. After he assumed 
control, there was a complete transformation and the group were able to complete 
nine calculations in three months, three times as many in a third of the time. How 
was this done? As Feynman explains in Surely You’re Joking, Mr. Feynman, his 
team used parallel processing to allow them to work on two or three problems at 
the same time. Unfortunately, this spectacular increase in productivity resulted in 
management assuming that a single job took only two weeks or so — and that a 
month was plenty of time to do the calculation for the final Trinity test configuration. 
Feynman and his team then had to do the much more difficult task of figuring out 
how to parallelise a single problem. 


During the 1980’s, Feynman became familiar with two pioneering parallel com- 
puting systems — the Connection Machine, made by Thinking Machines Corpo- 
ration in Boston, and the Cosmic Cube, built by Geoffrey Fox and Chuck Seitz 
at Caltech. Parallel computing was one of the ‘advanced topics’ discussed in the 
lecture course and both types of parallel architecture — SIMD or Single Instruction 
Multiple Data, exemplified by the Connection Machine, and MIMD or Multiple 
Instruction Multiple Data, exemplified by the Cosmic Cube — were analysed in 
some detail. Parallel computing was in its infancy in the early 1980's, and in the 
first chapter of this next section Feynman talks optimistically of the future for par- 
allel computing. This chapter is a reprint of a little-known talk he gave in Japan 
as the 1985 Nishina Memorial Lecture. In addition to discussing possible energy 
consumption problems and size limitations of future computers, Feynman is very 
positive about the role for parallel computing in the future. 


By contrast, in the next chapter, Geoffrey Fox reflects on the failure of parallel 
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Fig. 3. Tom Newman programmed the electron beam lithography machine to write out 
the first page of the novel A Tale of Two Cities by Charles Dickens. The reduction in size 
is 25,000 to 1 and each letter is only about 50 atoms wide. 
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computing and computational physics to become a major focus for growth over 
the last ten years. In his view, the problem is not that parallel computing cannot 
be made to work effectively for many types of scientific problems. Instead, the 
outstanding problem is that the size of the parallel computer market has been in- 
sufficient to solve the difficult problem of developing high quality, high-level parallel 
programming environments that are both easy to use and also offer a straightfor- 
ward migration path for users with a significant investment in existing sequential 
software. Feynman’s optimistic suggestion that ‘programmers will just have to 
learn how to do it,’ while true for ‘Grand Challenge’ scientific problems, has not 
yet come true in a commercial sense. Fox offers an alternative vision of the future, 
encapsulated by the term ‘Internetics’. It is in the context of the World Wide Web 
and commodity hardware and software, that parallel computing has a commercially 
viable future. 


In the third chapter of this section, Feynman’s first-hand involvement with par- 
allel computing is chronicled by parallel computer pioneer Danny Hillis. Feynman’s 
son Carl, then an undergraduate at MIT, was helping Hillis with his ambitious 
thesis project to design a new type of parallel computer powerful enough to solve 
common sense reasoning problems. Over lunch, one day in the spring of 1983, Hillis 
told Feynman he was founding a company to build this machine. After saying that 
this was ‘positively the dopiest idea I ever heard’, Feynman agreed to work as a con- 
sultant for the new company. As Hillis recounts, when Feynman was told the name 
of the company ‘Thinking Machines Corporation’ he was delighted. “That’s good. 
Now I don’t have to explain to people that I work with a bunch of loonies. I can 
just tell them the name of the company.” What shines through the article by Hillis 
is Feynman’s need to be involved with the details — with the implementation of 
Hopfield’s neural networks, with a clever algorithm for computing a logarithm, and 
with Quantum Chromo-Dynamics using a parallel-processing version of BASIC he 
had devised. Feynman’s triumph came with the design of the message router that 
enabled the 64,000 processors of the machine to communicate. Using an analysis 
based on differential equations, he had come up with a more efficient design than 
that of the engineers who had used conventional discrete methods in their analysis. 
Hillis describes how engineering constraints on chip size forced them to set aside 
their initial distrust of Feynman’s solution and use it in the final machine design. 


One of the earliest applications on the Connection Machine was John Conway’s 
‘Game of Life’, which is an example of a cellular automaton model. Feynman was 
always interested in the idea that down at the bottom, space and time might ac- 
tually be discrete. What we observe as continuous physics might be merely the 
large-scale average of the behaviour of vast numbers of tiny cells. In the final con- 
tribution to this section, Norman Margolus, who gave two lectures in Feynman’s 
original course — one on reversible logic and billiard ball computers, and a second 
on cellular automata — updates these ideas and explores them in detail. Margolus 
describes how he and Tom Toffoli built several generations of ‘SIMD’ cellular au- 
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tomata computers but does not include his account of one of Feynman’s not-so-epic 
adventures. Margolus had given Feynman one of their cellular automata computers 
for his own use and as a result, Feynman had gone out to buy a color monitor, 
to better display the ‘live’ simulations. On leaving the store Feynman tripped and 
bumped his head badly. Although he did not think it serious at the time, he became 
unable to drive a car safely or even answer basic questions in his famous Physics-X 
class at Caltech. Apparently, although alarmed by these symptoms, Feynman put 
them down to his getting old and ‘losing it.’ It was some time before the doctors un- 
derstood what the problem was and drilled a hole in his head to relieve the pressure 
so that Feynman became smart again! Losing your intelligence is not such a funny 
adventure. I am also grateful to Norman for his suggestion to include Feynman’s 
paper on ‘Simulating Physics with Computers’ in this volume. 


Fundamentals 


The last part is entitled ‘Fundamentals’ and leads off with a reprint of John Archibald 
Wheeler’s paper on ‘Information, Physics, Quantum — The Search for Links’. As 
Rolf Landauer has said, ‘Wheeler’s impact on quantum computation has been sub- 
stantial — through his papers, his involvement in meetings, and particularly through 
his students and associates.’ Feynman was an early student of Wheeler, of course, 
and so was Wojciech Zurek, also a contributor to this volume. In Zurek’s view, 
the paper by Wheeler, first published in 1989, is ‘still a great, forward-looking call 
to arms’. The credo of the paper is summarized by the slogan It from Bit — the 
hypothesis that every item of the physical world, be it particle or field of force, ulti- 
mately derives its very existence from apparatus-solicited answers to binary, yes/no 


questions. 


Another influential figure in the computational community is Ed Fredkin, who 
first met Feynman in 1962. Fredkin and Marvin Minsky were in Pasadena with 
nothing to do one evening and they ‘sort of invited themselves’ to Feynman’s house. 
The three discussed many things until the early hours of the morning and, in par- 
ticular, the problem of whether a computer could perform algebraic manipulations. 
Fredkin credits the origin of MIT’s MACSYMA algebraic computing project to that 
discussion in Pasadena. In his chapter, Fredkin discusses his time at Caltech as a 
Fairchild Scholar in 1974, and his preoccupation with reversible dynamics. The deal 
this time was that Feynman would teach Fredkin quantum mechanics and Fredkin 
would teach Feynman computer science. Fredkin believes he got the better of the 
deal: ‘It was very hard to teach Feynman something because he didn’t want to let 
anyone teach him anything. What Feynman always wanted was to be told a few 
hints as to what the problem was and then to figure it out for himself. When you 
tried to save him time by just telling him what he needed to know, he got angry 
because you would be depriving him of the satisfaction of discovering it for himself.’ 
Besides learning quantum mechanics, Fredkin’s other assignment to himself during 
this year was to understand the problem of reversible computation. They had a 
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Small Wonder 


In the Middle Ages scholars wondered how 
many angels would fit on the head of a pin. This 
puzzle was never satisfactorily answered. But as 
the result of a recent technological advance at 
Stanford University we now know that the entire 
Encyclopaedia Britannica would convortably fit 
there. 

In 1960 Richard Feynman, the Caitech physicist, 
offered a $1,000 prize to anyone who could make 
a printed page 25,000 umes smaiier while still 
allowing it to be read. A Stanford graduate student, 
Tom Newman, has now done it, and Feynman has 
paid him the grand. 


flos Angeles Simes: 


A Tienes Misror Newspuper 


Newman's technique is based on the same 
technology that is used to imprint electronic 
circuits on those tiny computer chips that are 
everywhere. Newman uses several electron beama 
to trace letters made ap of dots that are 60 atoms 
wide. The resulting text can be read with an 
electson microscope. 

Some technological advances bring instant re- 
wards to humanity, while some have no practical 
use—at least for the moment. They are fust 
amazing. In the Jatter category, chalk one up 
for Tom Newman, with an assist from Richard 
Feynman. 
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Fig. 4. A Los Angeles Times article reporting the winner of the second Feynman challenge, 
more than 26 years after Feynman first announced the prize. Tom Newman rang up 
Feynman to ask if the challenge was still open and then completed the project when his 
thesis advisor was away from Stanford. 


wonderful year of creative arguments and Fredkin invented Conservative Logic and 
the ‘Fredkin Gate’ — which led to the billiard ball computer. During one of their 
arguments Feynman got so exasperated that he broke off the argument and started 
to quiz Fredkin about quantum mechanics. After a while he stopped the quiz and 
said “The trouble with you is not that you don’t understand quantum mechanics.” 


The last two contributions are by two of the major figures in the field. The first 
is by Tom Toffoli, who helped organise the 1981 MIT conference on the Physics of 
Computation at which Feynman spoke. Toffoli is also credited by Feynman ‘for his 
help with the references’ in his Optics News paper on quantum computers, reprinted 
in the Lectures on Computation. Toffoli has had a long-standing interest in cellular 
automata and, using the idea of a fine-grained dynamical substrate underlying the 
observed dynamics, he speculates on a possible link between ‘action integrals’ in 
physics and the concept of ‘computation capacity.’ On this view, the ‘principle of 
least action’, so often used by Feynman, is ‘an expression not of Nature’s parsimony 
but of Nature’s prodigality: A system’s natural trajectory is the one that will hog 
most computational resources.’ 


The final chapter is by Wojciech Zurek whose initial interest in the subject of 
physics and information was stimulated by John Wheeler. It was also Wheeler 
who insisted that Zurek maintain a regular dialogue with Feynman when Zurek 
was appointed as a Tolman Fellow at Caltech in 1981. In his article, Zurek goes 
beyond the seminal work of Landauer and Bennett in exploring any threat to the 
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second law of thermodynamics posed by an intelligent version of Maxwell’s Demon. 
The intellectual abilities of the demon are assumed to be equivalent to those of a 
universal Turing machine and the notion of algorithmic information content provides 
a measure of the storage space required to describe the system. 


Feynman Stories 


In this eclectic collection of papers on computation and information theory, the 
reader will find a number of ‘new’ Feynman stories. Murray Gell-Mann, his long- 
time colleague at Caltech, always deplored the way Feynman ‘surrounded himself 
with a cloud of myth’ and the fact that ‘he spent a great deal of time and energy 
generating anecdotes about himself’. In fact, I think the stories generate themselves. 
For example, in 1997 Ed Fredkin came to Southampton to help us celebrate the 50th 
anniversary of our Department of Electronics and Computer Science — as far as 
we know the first, specifically ‘electronics’ department in the world. Ed gave a 
talk which formed the basis for his contribution to this volume, but he also told a 
Feynman story that does not appear in his written version. With apologies to Ed, 
I would like to tell it here. 


The story concerns the so-called ‘twin paradox’ in relativity. In his book, Feyn- 
man had written “You can’t make a spaceship clock, by any means whatsoever, 
that keeps time with the clocks at home.” Now Fredkin happened to be teaching 
a course and this subject came up. In thinking about the paradox, Fredkin came 
up with a trivial way to make a spaceship clock that did keep time with the clock 
at home. Before making a fool of himself in front of his students, Fredkin thought 
he’d check with Feynman first. There was, of course, an ulterior motive for doing 
this and that was to ‘sandbag’ Feynman — a thing that Fredkin loved to do but 
rarely succeeded. The telephone conversation went something like this. Fredkin 
said “It says in your book that it is impossible for a clock on the spaceship to keep 
time with a clock at home. Is that correct?” Feynman replied “What it says in the 
book is absolutely correct.” Having set him up, Fredkin countered “OK, but what 
if I made a clock this way ...” and then proceeded to describe how his proposed 
clock had knowledge of the whole trajectory and could be programmed to put the 
‘back home’ time on the face of the clock. “Wouldn’t that keep time with the clocks 
back home?” Feynman said “That is absolutely correct.” Fredkin replied “Then 
what does that mean about what’s in your book?” Feynman’s instant response was 
“What it says in the book is absolutely wrong!” 


Anyone who has had any long-term contact with Feynman will have a fund of 
stories such as this one. In all the things he did, Feynman was never afraid to 
admit he was mistaken and he constantly surprised his audience with his direct 
and unconventional responses. In this way, many of the Feynman stories generate 
themselves without any overt act of creation by Feynman himself. 
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Research and Teaching 


What these anecdotes, and what these chapters illustrate, is how intimately research 
and teaching were blended in Feynman’s approach to any subject. Danny Hillis 
remembers how Feynman worked on problems at Thinking Machines. While he 
was engaged in solving a problem he hated to be interrupted, but once he had 
found a solution ‘he spent the next day or two explaining it to anyone who would 
listen.’ Explanation and communication of his understanding were an essential 
part of Feynman’s methodology. He also had no problem about the fact that he 
was sometimes recreating things that other people already knew — in fact I don’t 
think he could learn a subject any other way than by finding out for himself. 


Carver Mead, however, remembers another, more combative side to Feynman. 
Besides improving his skills on integrals and numerology in duels with Hans Bethe, 
the hot-house atmosphere of Los Alamos during the war had honed Feynman’s 
skills in argument: ‘The one who blinked first lost the argument.’ As Mead says, 
‘Feynman learned the game well — he never blinked.’ For this reason, Feynman 
would never say what he was working on: He preferred ‘to spring it, preferably 
in front of an audience, after he had it all worked out.’ Mead learnt to tell what 
problems Feynman cared about by noticing which topics made him mad when they 
were brought up. Furthermore, Mead goes on to say, if Feynman was stuck about 
something, ‘he had a wonderful way of throwing up a smoke screen’ which Mead 
calls Feynman’s “proof by intimidation.” 


Feynman’s grasp of the big picture, coupled with his love for knowing first-hand 
of practical details — from low-level programming to lock-picking — gave him an 
almost unique perspective on any subject he chose to study. It was this mastery, 
both of the minutiae of a subject and of its broad intellectual framework, that gave 
him the seemingly effortless ability to move back and forth between the two levels 
at will, without getting lost in the detail or losing the overall plot. 


How to be an Editor 


Feynman was an inspiring teacher who declined the ‘easy’ option of giving the 
same course every year. He chose to spend a large part of the last decade of his 
life thinking about the fundamentals of computation. Stan Williams, who works at 
Hewlett-Packard on nanostructures, quotes me as saying that the Feynman Lectures 
on Computation were the most important thing I have done in my career. Now I 
am not sure that I quite said that, but it zs true that I am glad his last lectures 
have seen the light of day. Furthermore, with this volume, the links and connections 
with the people in the computational community that he was inspired by, or who 
were inspired by him, are recorded. 


When I took on the job of putting together this second volume, I fondly imag- 
ined it would be easier than constructing the first from rough notes and tapes. I 
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little knew what skills an ‘editor’ requires. Getting agreement in principle for a con- 
tribution is easy: Getting the contribution in reality is much more difficult. Some 
examples will make the point. Marvin Minsky was wonderfully encouraging about 
the project initially — but I felt bad at having to telephone Marvin at his home at 
regular intervals, badgering him for his paper. Gerry Sussman daily demonstrates 
an incredible breadth and depth of knowledge, on subjects ranging from program- 
ming in SCHEME to the foundations of classical mechanics. On talking with him 
and Tom Knight at MIT, he described their current research project by holding 
up his hand and saying “I want to know how to program this.” It is therefore not 
surprising that I found it difficult to intrude on his manifold activities and persuade 
him to set them aside for the time required to complete his brief contribution to this 
volume. I’m glad he did, since his contribution to Feynman’s course was certainly 
worthy of acknowledgement. 


A special note of thanks is owing to Rolf Landauer: He not only was first 
to deliver his text but he was also wise enough to apply subtle pressure on me to 
complete the task. This he did by telling me he had no doubts about my skills to put 
together an exciting volume. There certainly were times when I doubted whether I 
would be able to persuade Charles Bennett to devote enough time to write up the 
talk he had given at our Southampton Electronics celebrations. Since Charles was 
one of those who had been responsible for educating Feynman about the field, and 
had participated in the original lecture course, I felt it was important to persevere. 
Finally, I hit on the idea of telling him that his colleague, Rolf Landauer, did not 
think he would make my final, final deadline ... And of course, I should thank not 
only those who gave me worries but delivered, but also those who delivered on time, 
to a tight schedule. To all of them, many thanks — I hope you like the result. 
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Part I 


Feynman’s Course on 
Computation 


FEYNMAN AND COMPUTATION 


John J. Hopfield 


In early 1981, Carver Mead and I came up with an idea for a year-long course 
on computation as a physical process, trying to consider digital computers, analog 
computers, and the brain as having things in common. Carver thought that Richard 
Feynman would be interested in joining the enterprise, and indeed he was, so the 
three of us did a little planning. At that juncture, Feynman had one of his heroic 
bouts with cancer, and Mead and I went on alone. 


The course was a disaster. Lectures from different visitors were not coordinated, 
we wandered over an immense continent of intellectual terrain without a map, the 
students were mystified and attendance fell off exponentially. Carver and I learned 
a great deal about the physics of computation. More important, we learned never, 
never to try such a thing again. 


However, when I ran across a recovered Feynman in October 1982, he immedi- 
ately asked “Hey, what happened to that course me and you and Mead were going 
to give?” I gave a terse and charitable interpretation of events, to which Feynman 
immediately asked (Mead being on leave from teaching at the time) whether he 
and I could try it again, saying he would really do his share and more importantly, 
organize the subject his way. So it was that his course on the physics of computa- 
tion was born, in the winter of 1983. Mead was also dragooned into lecturing a bit. 
I will recount a little from this era, the class, and our lunchtime meetings to talk 
about the subject. 


The format consisted of two lectures a week. The first of these lectures was 
usually from an outside lecturer. Marvin Minsky led off. The second lecture in 
a week was a critique by Feynman of the first lecture. He would summarize the 
important points from the first lecture with his own organization, integrating the 
lecture into the rest of science and computer sciencein Feynman’s inimitable fashion. 
Occasionally, this would become a lecture on what the speaker should have said if 
he had really understood the essence of his subject. And of course, the students 
loved it. 


There were three basic aspects of computers and computation which intrigued 
Feynman, and made the subject sufficiently interesting that he would spend time 
teaching it. First, what limits did the laws of physics place on computation? Sec- 
ond, why was it that when we wanted to do a hard problem a little better, it 
always seemed to need an exponentially greater amount of computing resources? 
Third, how did the human mind work, and could it somehow get around the second 
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problem? 


Feynman himself did the lecture on quantum computation. Charles Bennett had 
already prepared the way by talking about reversible computation, and Norman 
Margolus on the billiard ball computer, so Feynman developed quantum computa- 
tion in ways which made its connection to classical reversible computers obvious. 
His lecture was chiefly developing the idea that an entirely reversible quantum sys- 
tem, without damping processes, could perform universal computation. I gave the 
second lecture, chiefly on the subject of restoration, since Feynman’s quantum com- 
puter was built on the basis of a Hamiltonian which he had designed, and whose 
parameters were mathematically exact. I pressed hard on the fact that all real 
digital computers had noise and errors in construction, that ways of restoring the 
signals after such errors were essential, and that the laws of physics would not per- 
mit the building of arbitrary Hamiltonians with precision. However, in spite of his 
interest in the physical and the reality of nature, he found such questions of little 
interest in quantum computing. 


While Paul Benioff had been thinking (and writing) about quantum computation 
before Feynman, I don’t think that Feynman had ever read any of what Benioff had 
said. As in his approach to most other subjects, Feynman simply ignored what 
had been done before, easily recreating it independently if relevant, and of course 
not referencing what he had not read. While he is often given credit for helping 
originate ideas of quantum computation, my recollections of the many conversations 
with him on the subject contain no notion of his that quantum computers could 
in some sense of N scaling be better than classical computing machines. He only 
emphasized that the physical scale and speed of computers were not limited by the 
classical world, since conceptually they could be built of reversible components at 
the atomic level. The insight that quantum computers were really different came 
only later, and to others, not to Feynman. 


He loved the nuts and bolts of computers and low-level coding for machines. At 
the time, I had a new model of associative memory, had implemented the model on 
a small scale, and was interested in exploring it on a much larger scale. Feynman 
raised the question as to whether I had considered implementing it on the Connec- 
tion Machine of his friend Danny Hillis. At the time, that machine was enormous 
in potential, but had a very elementary processor and instruction set for each of 
its 64,000 nodes. I explained why the problem I was doing did not fit onto such 
an architecture and instruction set effectively. At the our next lunch, he sketched 
a different way to represent and order the operations which were needed, so that 
virtually the entire theoretical computing power of the Connection Machine were 
put into use. (My research paper reprinted in this volume is the one for which this 
programming was done, and is the only research paper of mine in which he ever 
took an interest. Strangely, he did not himself try to make mathematical models of 
how the brain worked.) 
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Throughout his life, Feynman had been an observer of his own mental processes, 
and enjoyed speculating on how the brain must work on the basis of his observations. 
For example, the socks. As a student living in the Graduate College at Princeton, 
most mundane worries were taken care of. However, you had to count out your 
shirts, socks, etc. for the laundry from time to time, which he found a bore, and 
which led him to be interested in what you could do at the same time as you 
counted. So he developed the idea of counting internally to 60, and seeing if you 
could judge a fixed time period (roughly a minute, but dependent on your personal 
internal cadence for counting). If you could do a task and also at the same time 
judge accurately the passing of this time interval, then the tasks did not interfere, 
probably because different brain hardware was being used. He found, for example, 
that he could read and count (by this measure) at the same time, but that he could 
not speak and maintain a judgement of the time interval. He elaborated on this 
theme at dinner one day. John Tukey took exception to it, saying he was sure 
that by this criterion, he could count and speak at the same time. The experiment 
was done, and Tukey could indeed do the both tasks at once. Feynman pursued 
Tukey on the issue, to ultimately learn that Tukey knew he generally counted not 
linguistically (as most of us do), but visually, by seeing objects group in front of 
him. Since he, like Feynman, should be able to do a visual task and a verbal task 
at the same time, he (but not Feynman!) would be able to speak and count at the 
same time. Feynman went on to explain to me that he might have known there 
would be other things going on in other minds, since for him letters and numbers 
had colors (as well as shapes) and when he was manipulating equations in his head, 
he could just use the colors. He speculated that the color-symbol link must have 
come from a set of blocks he had as a small child. 


In Southern California, the rich sometimes give fancy parties to which they invite 
the accomplished, and so it was that Francis Crick and Richard Feynman met at 
a cocktail party in Beverly Hills. Wanting to talk, not finding the occasion, they 
sought a means of getting together. Finding that I seemed to be their only common 
link, I was ultimately assigned the task of getting them together. So there was a 
happy Athenaeum (Caltech Faculty Club) lunch which I hosted and at which they 
sparred. Once again, Feynman took up the role of observer of his own mind. He 
asked why it was that when you move the eyes from one spot to another in a saccade 
(rapid eye motion) the world does not blur during the motion. Crick replied that 
the visual signals to the brain are gated off during this time. Feynman responded 
“... but I remember when I was a kid riding my bicycle fast, looking at the front 
wheel, every once and a while I could see the writing on the tire very clearly, and 
I thought it was because my eyes were accidentally moving in such a way that the 
image of the tire was stationary on my retina...” Crick quickly admitted that it 
was only a diminution, not a complete shutting off, of the visual signals, and that 
Feynman was probably right about the explanation. 


The Athenaeum was not always an intellectual haven. One day when the two of 
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us were lunching, a woman who had recently read ‘Surely You Are Joking...’ (by 
Feynman, as told to Ralph Leighton) came up to our table to browbeat Feynman 
about his male chauvinist view of women as portrayed in that book. He put her off 
as best he could, then turned to me sadly, saying “It’s all Ralph’s fault, you know. 
If Ralph had been a women, I would have told him all about my relationship with 
my first wife (who died of tuberculosis), and everyone would have a very different 
notion of me. But it is not the kind of thing which men discuss with each other, 
so Ralph never asked, and I couldn’t have said anything even if he had.” He even 
MEANT that it was Ralph’s fault. 


A scientific giant who was always on stage, he went on to develop his Physics of 
Computation course as a regular feature of Computer Science, and was disappointed 
when no one stepped forward to teach it after he ceased to do so. But what human 
of normal accomplishment could attempt such a role? 
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NEURAL NETWORKS AND PHYSICAL 
SYSTEMS WITH EMERGENT COLLECTIVE 
COMPUTATIONAL ABILITIES 


John J. Hopfield * 


Abstract 


Computational properties of use to biological organisms or to the construction of 
computers can emerge as collective properties of systems having a large number 
of simple equivalent components (or neurons). The physical meaning of content- 
addressable memory is described by an appropriate phase space flow of the state of 
a system. A model of such a system is given, based on aspects of neurobiology but 
readily adapted to integrated circuits. The collective properties of this model pro- 
duce a content-addressable memory which correctly yields an entire memory from 
any subpart of sufficient size. The algorithm for the time evolution of the state of 
the system is based on asynchronous parallel processing. Additional emergent col- 
lective properties include some capacity for generalization, familiarity recognition, 
categorization, error correction, and time sequence retention. The collective prop- 
erties are only weakly sensitive to details of the modeling or the failure of individual 
devices. 


2.1 Introduction 


Given the dynamical electrochemical properties of neurons and their interconnec- 
tions (synapses), we readily understand schemes that use a few neurons to obtain el- 
ementary useful biological behavior [1-3]. Our understanding of such simple circuits 
in electronics allows us to plan larger and more complex circuits which are essential 
to large computers. Because evolution has no such plan, it becomes relevant to ask 
whether the ability of large collections of neurons to perform “computational” tasks 


may in part be a spontaneous collective consequence of having a large number of 
interacting simple neurons. 


In physical systems made from a large number of simple elements, interactions 
among large numbers of elementary components yield collective phenomena such as 
the stable magnetic orientations and domains in a magnetic system or the vortex 
patterns in fluid flow. Do analogous collective phenomena in a system of sim- 
ple interacting neurons have useful “computational” correlates? For example, are 


*Reproduced from Proc. Natl. Acad. Sci. USA. Vol.79, pp. 2554-2558, April 1982. Biophysics. 
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the stability of memories, the construction of categories of generalization, or time- 
sequential memory also emergent properties and collective in origin? This paper 
examines a new modeling of this old and fundamental question [4-8] and shows that 
important computational properties spontaneously arise. 


All modeling is based on details, and the details of neuroanatomy and neural 
function are both myriad and incompletely known [9]. In many physical systems, the 
nature of the emergent collective properties is insensitive to the details inserted in 
the model (e.g., collisions are essential to generate sound waves, but any reasonable 
interatomic force law will yield appropriate collisions). In the same spirit, I will 
seek collective properties that are robust against change in the model details. 


The model could be readily implemented by integrated circuit hardware. The 
conclusions suggest the design of a delocalized content-addressable memory or cat- 
egorizer using ex-tensive asynchronous parallel processing. 


2.2 The general content-addressable memory of a physical 
system 


Suppose that an item stored in memory is “H. A. Kramers & C. H. Wannier Phys. 
Rev. 60, 252 (1941).” A general content-addressable memory would be capable of 
retrieving this entire memory item on the basis of sufficient partial information. The 
input “& Wannier, (1941)” might suffice. An ideal memory could deal with errors 
and retrieve this reference even from the input “Vannier, (1941)”. In computers, 
only relatively simple forms of content-addressable memory have been made in 
hardware [10, 11]. Sophisticated ideas like error correction in accessing information 
are usually introduced as software [10]. 


There are classes of physical systems whose spontaneous behavior can be used 
as a form of general (and error-correcting) content-addressable memory. Consider 
the time evolution of a physical system that can be described by a set of general 
coordinates. A point in state space then represents the instantaneous condition of 
the system. This state space may be either continuous or discrete (as in the case of 
N Ising spins). 


The equations of motion of the system describe a flow in state space. Various 
classes of flow patterns are possible, but the systems of use for memory particularly 
include those that flow toward locally stable points from anywhere within regions 
around those points. A particle with frictional damping moving in a potential well 
with two minima exemplifies such a dynamics. 


If the flow is not completely deterministic, the description is more complicated. 
In the two-well problems above, if the frictional force is characterized by a tem- 
perature, it must also produce a random driving force. The limit points become 
small limiting regions, and the stability becomes not absolute. But as long as the 
stochastic effects are small, the essence of local stable points remains. 
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Consider a physical system described by many coordinates X;...Xy, the com- 
ponents of a state vector X. Let the system have locally stable limit points 
X,, Xp... Then, if the system is started sufficiently near any Xa, as at X = X,+A, 
it will proceed in time until X ~ X,. We can regard the information stored in the 
system as the vectors X,, Xp,... The starting point X = X, +A represents a par- 
tial knowledge of the item X,, and the system then generates the total information 
be 


Any physical system whose dynamics in phase space is dominated by a substan- 
tial number of locally stable states to which it is attracted can therefore be regarded 
as a general content-addressable memory. The physical system will be a potentially 
useful memory if, in addition, any prescribed set of states can readily be made the 
stable states of the system. 


2.3 The model system 


The processing devices will be called neurons. Each neuron 1 has two states like 
those of McCullough and Pitts {12]: V; = 0 (“not firing”) and V; = 1 (“firing at 
maximum rate”). When neuron 7 has a connection made to it from neuron j, the 
strength of connection is defined as T;;. (Nonconnected neurons have T;; = 0.) The 
instantaneous state of the system is specified by listing the N values of V; , so it is 
represented by a binary word of N bits. 


The state changes in time according to the following algorithm. For each neuron 
a there is a fixed threshold U;. Each neuron 7 readjusts its state randomly in time 
but with a mean attempt rate W, setting 


vin (2 HE pa THY > - 
Of VR U 


Thus, each neuron randomly and asynchronously evaluates whether it is above 
or below threshold and readjusts accordingly. (Unless otherwise stated, we choose 
U,=0.) 


Although this model has superficial similarities to the Perceptron {13, 14] the 
essential differences are responsible for the new results. First, Perceptrons were 
modeled chiefly with neural connections in a “forward” direction A > B > C 3 
D. The analysis of networks with strong backward coupling A @ B= C proved 
intractable. All our interesting results arise as consequences of the strong back- 
coupling. Second, Perceptron studies usually made a random net of neurons deal 
directly with a real physical world and did not ask the questions essential to finding 
the more abstract emergent computational properties. Finally, Perceptron modeling 
required synchronous neurons like a conventional digital computer. There is no 
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evidence for such global synchrony and, given the delays of nerve signal propagation, 
there would be no way to use global synchrony effectively. Chiefly computational 
properties which can exist in spite of asynchrony have interesting implications in 
biology. 


2.4 The information storage algorithm 


Suppose we wish to store the set of states V°,s = 1...n. We use the storage 
prescription [15, 16] 


Tis = )_(2Vf - 1)(2V7 - 1) (2.2) 


s 


but with T;; = 0. From this definition 
> TaVF = Devs - 1) | D° VF (ave -1)| =H? (2.3) 
j s j 


The mean value of the bracketed term in Eq. 2.3 is 0 unless s = s’, for which the 
mean is‘: N/2. This pseudoorthogonality yields 


DT Vi = (HP) = VP - 1)N/2 (2.4) 
j 


and is positive if V = 1 and negative if V,° = 0. Except for the noise coming 
from the s  s’ terms, the stored state would always be stable under our processing 
algorithm. 


Such matrices T;, have been used in theories of linear associative nets [15-19] 
to produce an output pattern from a paired input stimulus, 5S; — O;. A second 
association Sy — Oz can be simultaneously stored in the same network. But the 
confusing stimulus 0.6 S; + 0.4 5S» will produce a generally meaningless mixed 
output 0.6 O; + 0.4 Oy. Our model, in contrast, will use its strong nonlinearity 
to make choices, produce categories, and regenerate information and, with high 
probability, will generate the output O, from such a confusing mixed stimulus. 


A linear associative net must be connected in a complex way with an external 
nonlinear logic processor in order to yield true computation [20, 21]. Complex cir- 
cuitry is easy to plan but more difficult to discuss in evolutionary terms. In contrast, 
our model obtains its emergent computational properties from simple properties of 
many cells rather than circuitry. 


2.5 The biological interpretation of the model 


Most neurons are capable of generating a train of action potentials — propagating 
pulses of electrochemical activity — when the average potential across their mem- 
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Fig. 2.1. Firing rate versus membrane voltage for a typical neuron (solid line), dropping 
to 0 for large negative potentials and saturating for positive potentials. The broken lines 
show approximations used in modeling. 


brane is held well above its normal resting value. The mean rate at which action 
potentials are generated is a smooth function of the mean membrane potential, 
having the general form shown in Fig. 2.1. 


The biological information sent to other neurons often lies in a short-time average 
of the firing rate [22]. When this is so, one can neglect the details of individual action 
potentials and regard Fig. 2.1 as a smooth input-output relationship. (Parallel 
pathways carrying the same information would enhance the ability of the system to 
extract a short-term average firing rate [23, 24].) 


A study of emergent collective effects and spontaneous computation must nec- 
essarily focus on the nonlinearity of the input-output relationship. The essence of 
computation is nonlinear logical operations. The particle interactions that produce 
true collective effects in particle dynamics come from a nonlinear dependence of 
forces on positions of the particles. Whereas linear associative networks have em- 
phasized the linear central region [14-19] of Fig. 2.1, we will replace the input-output 
relationship by the dot-dash step. Those neurons whose operation is dominantly lin- 
ear merely provide a pathway of communication between nonlinear neurons. Thus, 
we consider a network of “on or off” neurons, granting that some of the intercon- 
nections may be by way of neurons operating in the linear regime. 


Delays in synaptic transmission (of partially stochastic character) and in the 
transmission of impulses along axons and dendrites produce a delay between the 
input of a neuron and the generation of an effective output. All such delays have 
been modeled by a single parameter, the stochastic mean processing time 1/W. 


The input to a particular neuron arises from the current leaks of the synapses 
to that neuron, which influence the cell mean potential. The synapses are activated 
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by arriving action potentials. The input signal to a cell 2 can be taken to be 
> TisVi (2.5) 
J 


where T;; represents the effectiveness of a synapse. Fig. 2.1 thus becomes an input- 
output relationship for a neuron. 


Little, Shaw, and Roney (8, 25, 26] have developed ideas on the collective func- 
tioning of neural nets based on “on/off” neurons and synchronous processing. How- 
ever, in their model the relative timing of action potential spikes was central and 
resulted in reverberating action potential trains. Our model and theirs have limited 
formal similarity, although there may be connections at a deeper level. 


Most modeling of neural learning networks has been based on synapses of a 
general type described by Hebb [27] and Eccles [28]. The essential ingredient is the 
modification of T;; by correlations like 


AT;; = [V; (t) V;(t) average (2.6) 


where the average is some appropriate calculation over past history. Decay in 
time and effects of [V;(t)]avg or [V;(t)]avg are also allowed. Model networks with 
such synapses (16, 20, 21] can construct the associative T;; of Eq.2.2. We will 
therefore initially assume that such a 7;; has been produced by previous experience 
(or inheritance). The Hebbian property need not reside in single synapses; small 
groups of cells which produce such a net effect would suffice. 


The network of cells we describe performs an abstract calculation and, for appli- 
cations, the inputs should be appropriately coded. In visual processing, for example, 
feature extraction should previously have been done. The present modeling might 
then be related to how an entity or Gestalt is remembered or categorized on the 
basis of inputs representing a collection of its features. 


2.6 Studies of the collective behaviors of the model 


The model has stable limit points. Consider the special case 7;; = T;;, and define 


1 
B=-5 > DTV (2.7) 
tj 
AE due to AV; is given by 
AE = —AV; >) TiVj (2.8) 
j#i 


Thus, the algorithm for altering V; causes EF to be a monotonically decreasing 
function. State changes will continue until a least (local) E is reached. This case 
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is isomorphic with an Ising model. Tj, provides the role of the exchange coupling, 
and there is also an external local field at each site. When Tj; is symmetric but has 
a random character (the spin glass) there are known to be many (locally) stable 
states [29]. 


Monte Carlo calculations were made on systems of N=30 and N=100, to ex- 
amine the effect of removing the T;; = Tj; restriction. Each element of Tj; was 
chosen as a random number between -1 and 1. The neural architecture of typical 
cortical regions [30, 31] and also of simple ganglia of invertebrates [32] suggests the 
importance of 100-10,000 cells with intense mutual interconnections in elementary 
processing, so our scale of JN is slightly small. 


The dynamics algorithm was initiated from randomly chosen initial starting 
configurations. For N=30 the system never displayed an ergodic wandering through 
state space. Within a time of about 4/W it settled into limiting behaviors, the 
commonest being a stable state. When 50 trials were examined for a particular 
such random matrix, all would result in one of two or three end states. A few stable 
states thus collect the flow from most of the initial state space. A simple cycle also 
occurred occasionally — for example,...A +> BAB... 


The third behavior seen was chaotic wandering in a small region of state space. 
The Hamming distance between two binary states A and B is defined as the number 
of places in which the digits are different. The chaotic wandering occurred within a 
short Hamming distance of one particular state. Statistics were done on the prob- 
ability p; of the occurrence of a state in a time of wandering around this minimum, 
and an entropic measure of the available states M/ was taken 


InM =— >> pi ln pi (2.9) 


A value of M=25 was found for N=30. The flow in phase space produced by 
this model algorithm has the properties necessary for a physical content-addressable 
memory whether or not Tj; is symmetric. 


Simulations with N=100 were much slower and not quantitatively pursued. 
They showed qualitative similarity to N=30. 


Why should stable limit points or regions persist when T,; 4 T;;? If the algo- 
rithm at some time changes V; from 0 to 1 or vice versa, the change of the energy 
defined in Eq. 2.7 can be split into two terms, one of which is always negative. The 
second is identical if T;; is symmetric and is “stochastic” with mean 0 if T;, and 
Tj; are randomly chosen. The algorithm for T,; 4 Tj; therefore changes E in a 
fashion similar to the way E would change in time for a symmetric T;; but with an 
algorithm corresponding to a finite temperature. 


About 0.15 N states can be simultaneously remembered before error in recall is 
severe. Computer modeling of memory storage according to Eq. 2.2 was carried out 
for N=30 and N=100. n random memory states were chosen and the corresponding 
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T,; was generated. If a nervous system preprocessed signals for efficient storage, the 
preprocessed information would appear random (e.g., the coding sequences of DNA 
have a random character). The random memory vectors thus simulate efficiently 
encoded real information, as well as representing our ignorance. The system was 
started at each assigned nominal memory state, and the state was allowed to evolve 
until stationary. 


Typical results are shown in Fig. 2.2. The statistics are averages over both the 
states in a given matrix and different matrices. With n=5, the assigned memory 
states are almost always stable (and exactly recallable). For n=15, about half of 
the nominally remembered states evolved to stable states with less than 5 errors, 
but the rest evolved to states quite different from the starting points. 


These results can be understood from an analysis of the effect of the noise terms. 
In Eq. 2.3, H is the “effective field” on neuron 7 when the state of the system is 
s’, one of the nominal memory states. The expectation value of this sum, Eq. 2.4, 
is +N/2 as appropriate. The s # s’ summation in Eq. 2.2 contributes no mean, but 
has a rms noise of [(n - 1)N/2]!/2 =. For nN large, this noise is approximately 
Gaussian and the probability of an error in a single particular bit of a particular 
memory will be 


= 1 —2? /20? 
P= res fe dx (2.10) 


For the case n=10, N=100, P=0.0091, the probability that a state had no errors 
in its 100 bits should be about e~°-9! = 0.40. In the simulation of Fig.2.2, the 
experimental number was 0.6. 


The theoretical scaling of n with N at fixed P was demonstrated in the sim- 
ulations going between N=30 and N=100. The experimental results of half the 
memories being well retained at n=0.15 N and the rest badly retained is expected 
to be true for all large N. The information storage at a given level of accuracy can 
be increased by a factor of 2 by a judicious choice of individual neuron thresholds. 
This choice is equivalent to using variables uy; = +1, T;; = >>, wu? #45, and a threshold 
level of 0. 


Given some arbitrary starting state, what is the resulting final state (or statis- 
tically, states)? To study this, evolutions from randomly chosen initial states were 
tabulated for N=30 and n=5. From the (inessential) symmetry of the algorithm, 
if (101110...) is an assigned stable state, (010001...) is also stable. Therefore, 
the matrices had 10 nominal stable states. Approximately 85% of the trials ended 
in assigned memories, and 10% ended in stable states of no obvious meaning. An 
ambiguous 5% landed in stable states very near assigned memories. There was a 
range of a factor of 20 of the likelihood of finding these 10 states. 


The algorithm leads to memories near the starting state. For N=30, n=0, 
partially random starting states were generated by random modification of known 
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Fig. 2.2. The probability distribution of the occurrence of errors in the location of the 
stable states obtained from nominally assigned memories. 


memories. The probability that the final state was that closest to the initial state 
was studied as a function of the distance between the initial state and the nearest 
memory state. For distance < 5, the nearest state was reached more than 90% of 
the time. Beyond that distance, the probability fell off smoothly, dropping to a 
level of 0.2 (2 times random chance) for a distance of 12. 


The phase space flow is apparently dominated by attractors which are the nom- 
inally assigned memories, each of which dominates a substantial region around it. 
The flow is not entirely deterministic, and the system responds to an ambiguous 
starting state by a statistical choice between the memory states it most resembles. 


Were it desired to use such a system in an Si-based content-addressable memory, 
the algorithm should be used and modified to hold the known bits of information 
while letting the others adjust. 


The model was studied by using a “clipped” T;;, replacing T;; in Eq. 2.3 by 
+1, the algebraic sign of T;;. The purposes were to examine the necessity of a 
linear synapse supposition (by making a highly nonlinear one) and to examine the 
efficiency of storage. Only N(N/2) bits of information can possibly be stored in 
this symmetric matrix. Experimentally, for N=100, n=9, the level of errors was 
similar to that for the ordinary algorithm at n=12. The signal-to-noise ratio can 
be evaluated analytically for this clipped algorithm and is reduced by a factor 
of (2/7)!/2 compared with the unclipped case. For a fixed error probability, the 
number of memories must be reduced by 2/7. 


With the w algorithm and the clipped T;;, both analysis and modeling showed 
that the maximal information stored for N=100 occurred at about n=13. Some 
errors were present, and the Shannon information stored corresponded to about 
N(N/8) bits. 
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New memories can be continually added to T;;. The addition of new memories 
beyond the capacity overloads the system and makes all memory states irretrievable 
unless there is a provision for forgetting old memories [16, 27, 28]. 


The saturation of the possible size of Ti; will itself cause forgetting. Let the 
possible values of Tj; be 0, +1, +2, +3 and Tj; be freely incremented within this 
range. If T;; = 3, a next increment of +1 would be ignored and a next increment 
of -1 would reduce Tj; to 2. When Tj; is so constructed, only the recent memory 
states are retained, with a slightly increased noise level. Memories from the distant 
past are no longer stable. How far into the past are states remembered depends 
on the digitizing depth of T;;, and 0,..., +3 is an appropriate level for N = 100. 
Other schemes can be used to keep too many memories from being simultaneously 
written, but this particular one is attractive because it requires no delicate balances 
and is a consequence of natural hardware. 


Real neurons need not make synapses both of 1 — j and 7 > 1. Particular 
synapses are restricted to one sign of output. We therefore asked whether T;; = T;; 
is important. Simulations were carried out with only one ij connection: if T,; 4 0, 
Tj; =0. The probability of making errors increased, but the algorithm continued 
to generate stable minima. A Gaussian noise description of the error rate shows 
that the signal-to-noise ratio for given n and N should be decreased by the factor 
1/./2, and the simulations were consistent with such a factor. This same analysis 
shows that the system generally fails in a “soft” fashion, with signal-to-noise ratio 
and error rate increasing slowly as more synapses fail. 


Memories too close to each other are confused and tend to merge. For N=100, 
a pair of random memories should be separated by 50 + 5 Hamming units. The 
case N=100, n=8, was studied with seven random memories and the eighth made 
up a Hamming distance of only 30, 20, or 10 from one of the other seven memories. 
At a distance of 30, both similar memories were usually stable. At a distance of 20, 
the minima were usually distinct but displaced. At a distance of 10, the minima 
were often fused. 


The algorithm categorizes initial states according to the similarity to memory 
states. With a threshold of 0, the system behaves as a forced categorizer. 


The state 00000... is always stable. For a threshold of 0, this stable state 
is much higher in energy than the stored memory states and very seldom occurs. 
Adding a uniform threshold in the algorithm is equivalent to raising the effective 
energy of the stored memories compared to the 0000 state, and 0000 also becomes 
a likely stable state. The 0000 state is then generated by any initial state that 
does not resemble adequately closely one of the assigned memories and represents 
positive recognition that the starting state is not familiar. 


Familiarity can be recognized by other means when the memory is drastically 
overloaded. We examined the case N=100, n=500, in which there is a memory 
overload of a factor of 25. None of the memory states assigned were stable. The 
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initial rate of processing of a starting state is defined as the number of neuron state 
readjustments that occur in a time 1/2W. Familiar and unfamiliar states were 
distinguishable most of the time at this level of overload on the basis of the initial 
processing rate, which was faster for unfamiliar states. This kind of familiarity can 
only be read out of the system by a class of neurons or devices abstracting average 
properties of the processing group. 


For the cases so far considered, the expectation value of T;; was 0 fori #7. A 
set of memories can be stored with average correlations, and T;; = C;; # 0 because 
there is a consistent internal correlation in the memories. If now a partial new state 
X is stored 


ATi; = (2X; a 1) (2X; _ 1) “1dI< k<N (2.11) 


using only k of the neurons rather than N, an attempt toreconstruct it will generate 
a stable point for all N neurons. The values of X,4;...Xy that result will be 
determined primarily from the sign of 


k 
> Cis X; (2.12) 
j=l 


and X is completed according to the mean correlations of the other memories. The 
most effective implementation of this capacity stores a large number of correlated 
matrices weakly followed by a normal storage of X. 


A nonsymmetric Tj; can lead to the possibility that a minimum will be only 
metastable and will be replaced in time by another minimum. Additional non- 
symmetric terms which could be easily generated by a minor modification of Hebb 
synapses 


AT;; = AS (2Vg*" — 1)(2V¥ - 1) (2.13) 
8 


were added to 7;;. When A was judiciously adjusted, the system would spend a 
while near V; and then leave and go to a point near V,4;. But sequences longer 
than four states proved impossible to generate, and even these were not faithfully 
followed. 


2.7 Discussion 


In the model network each “neuron” has elementary properties, and the network 
has little structure. Nonetheless, collective computational properties spontaneously 
arose. Memories are retained as stable entities or Gestalts and can be correctly 
recalled from any reasonably sized subpart. Ambiguities are resolved on a statistical 
basis. Some capacity for generalization is present, and time ordering of memories 
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can also be encoded. These properties follow from the nature of the flow in phase 
space produced by the processing algorithm, which does not appear to be strongly 
dependent on precise details of the modeling. This robustness suggests that similar 
effects will obtain even when more neurobiological details are added. 


Much of the architecture of regions of the brains of higher animals must be made 
from a proliferation of simple local circuits with well-defined functions. The bridge 
between simple circuits and the complex computational properties of higher nervous 
systems may be the spontaneous emergence of new computational capabilities from 
the collective behavior of large numbers of simple processing elements. 


Implementation of a similar model by using integrated circuits would lead to 
chips which are much less sensitive to element failure and soft-failure than are nor- 
mal circuits. Such chips would be wasteful of gates but could be made many times 
larger than standard designs at a given yield. Their asynchronous parallel process- 
ing capability would provide rapid solutions to some special classes of computational 
problems. 


The work at California Institute of Technology was supported in part by National 
Science Foundation Grant DMR-8107494. This is contribution no. 6580 from the Division 
of Chemistry and Chemical Engineering. 
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FEYNMAN AS A COLLEAGUE 


Carver A. Mead 


Feynman and I both arrived at Caltech in 1952 — he as a new professor of 
physics, and I as a freshman undergraduate. My passionate interest was electronics, 
and I avidly consumed any material I could find on the subject: courses, seminars, 
books, etc. As a consequence, I was dragged through several versions of standard 
electromagnetic theory: E and B, D and H, curls of curls, the whole nine yards. 
The only bright light in the subject was the vector potential, to which I was always 
attracted because, somehow, it made sense to me. It seemed a shame that the 
courses I attended didn’t make more use of it. In my junior year, I took a course 
in mathematical physics from Feynman — what a treat. This man could think 
conceptually about physics, not just regurgitate dry formalism. After one quarter 
of Feynman, the class was spoiled for any other professor. But when we looked 
at the registration form for the next quarter, we found Feynman as teaching high- 
energy physics, instead of our course. Bad luck! When our first class met, however, 
here came Feynman. “So you’re not teaching high-energy physics?” I asked. “No” 
he replied, “low-energy mathematics.” Feynman liked the vector potential too; for 
him it was the link between electromagnetism and quantum mechanics. As he put it 
“In the general theory of quantum electrodynamics, one takes the vector and scalar 
potentials as fundamental quantities in a set of equations that replace the Maxwell 
equations.” I learned enough about it from him to know that, some day, I wanted 
to do all of electromagnetic theory that way. 


By 1960 1 had completed a thesis on transistor physics and had become a brand 
new faculty member in my own right. Fascinated by Leo Esaki’s work on tunnel 
diodes, I started my own research on electron tunneling through thin insulating 
films. Tunneling is interesting because it is a purely quantum phenomenon. Elec- 
trons below the zero energy level in a vacuum, or in the forbidden gap of a semicon- 
ductor or insulator, have wave functions that die out exponentially with distance. I 
was working with insulators sufficiently thin that the wave function of electrons on 
one side had significant amplitude on the opposite side. The result was a current 
that decreased exponentially with the thickness of the insulator. From the results, I 
could work out how the exponential depended on energy. My results didn’t fit with 
the conventional theory, which treated the insulator as though it were a vacuum. 
But the insulator was not a vacuum, and the calculations were giving us important 
information about how the wave function behaved in the forbidden gap. Feynman 
was enthusiastic about this tunneling work. We shared a graduate student, Karvel 
Thornber, who used path integral methods to work out a more detailed model of 
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the insulator. 


In 1961 Feynman undertook the monumental task of developing a completely 
new 2-year introductory physics course. The first year covered mechanics; although 
that topic wasn’t of much interest to me, it would come up occasionally in our 
meetings on the tunneling project. When I heard that Feynman was going to do 
electromagnetic theory in the second year, I got very excited — finally someone 
would get it right! Unfortunately, it was not to be. The following quotation from 
the forward to the Feynman Lectures on Gravitation tells the story: 


“It is remarkable that concurrent with this course on gravitation Feynman was 
also creating and teaching an innovative course in sophomore (second year under- 
graduate) physics, a course that would become immortalized as the second and third 
volumes of The Feynman Lectures on Physics. Each Monday Feynman would give 
his sophomore lecture in the morning and the lecture on gravitation after lunch. 
Later in the week would follow a second sophomore lecture and a lecture for scien- 
tists at Hughes Research Laboratories in Malibu. Beside this teaching load and his 
own research, Feynman was also serving on a panel to review textbooks for the Cal- 
ifornia State Board of Education, itself a consuming task, as is vividly recounted in 
Surely You’re Joking, Mr. Feynman. Steven Frautschi, who attended the lectures 
aS a young Caltech assistant professor, remembers Feynman later saying that he 
was “utterly exhausted” by the end of the 1962-63 academic year.” 


I was another young Caltech assistant professor who attended the gravitation 
lectures, and I] remember them vividly. Bill Wagner, with whom I still collabo- 
rate over collective electrodynamics material, took notes, and later worked out the 
mathematical presentation in the written version of the lectures. I also attended 
many of the sophomore lectures, to which I had mixed reactions. If you read vol- 
ume II of The Feynman Lectures on Physics, you will find two distinct threads. The 
first is a perfectly standard treatment, like that in any introductory book on the 
subject. In his preface, Feynman says of this material “In the second year I was 
not so satisfied. In the first part of the course, dealing with electricity and mag- 
netism, I couldn’t think of any really unique or different way of doing it.” There is 
a second thread, however, of true vintage Feynman — the occasional lectures where 
he waxed eloquent about the vector potential “E and B are slowly disappearing 
from the modern expression of physical laws; they are being replaced by A and 
gd.” Section 15-5 contains a delightful discussion about what a field is, and what 
makes one field more “real” than another. He concludes “In our sense then, the A 
field is real.” In Chapter 25, he develops the equations of electrodynamics in four- 
vector form — the approach that I have adopted in the Collective Electrodynamics 
sequence. I can remember feeling very angry with Feynman when I sat in on this 
particular lecture. Why hadn’t he started this way in the first place, and saved us 
all the mess of a B field, which, as he told us himself, was not real anyway? When 
I asked him about it, he said something vague, like “There are a bunch of classical 
interactions that you can’t get at in any simple way without Maxwell’s equations. 
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You need the v x B term.” I don’t remember his exact words here, only the gist 
of the discussion. Sure enough, when volume II of the lectures was published, in 
table 15-1 the equation F = q(E + v x B) appears in the column labelled “True 
Always.” The equation is true for the toy electric motor he shows in Fig. 16-1. It 
is not true in general. For a real electric motor, the B field is concentrated in the 
iron, rather than in the copper where the current is flowing, and the equation gives 
the wrong answer by a factor of more than 100! That factor is due to the failure of 
B to be “real,” precisely in Feynman’s sense. 


I was an active researcher in solid-state physics at that time, and I used the 
quantum nature of electrons in solids every day. Electrodynamics deals with how 
electrons interact with other electrons. The classical interactions Feynman was 
talking about were between electrons in metals, in which the density of electrons 
is so high that quantum interaction is by far the dominant effect. If we know how 
the vector potential comes into the phase of the electron wave function, and if the 
electron wave function dominates the behavior of metals, then why can’t we do all 
of electromagnetic theory that way? Why didn’t he use his knowledge of quantum 
electrodynamics to “take the vector and scalar potentials as fundamental quantities 
in a set of equations that replace the Maxwell equations,” as he himself had said? 
1 was mystified; his cryptic answer prodded me to start working on the problem. 
But every time I thought I had an approach, I got stuck. 


Bill Fairbank from Stanford had given a seminar on quantized flux in super- 
conducting rings that impressed me very much. The solid-state physics club was 
much smaller in those days, and, because I was working in electron tunneling, I was 
close to the people working on tunneling between superconductors. Their results 
were breaking in just this time frame, and Feynman gave a lecture about this topic 
to the sophomores; it appears as chapter 21 in volume 3. As I listened to that 
lecture, my thoughts finally clicked: That was how we could make the connection! 
A superconductor is a quantum system on a classical scale, and that fact allows us 
to carry out Feynman’s grand scheme. But I couldn’t get this approach to go all 
the way through at that time, so it just sat in the back of my mind all these years, 
vaguely tickling me. 


Meanwhile my work on tunneling was being recognized, and Gordon Moore 
(then at Fairchild) asked me whether tunneling would be a major limitation on 
how small we could make transistors in an integrated circuit. That question took 
me on a detour that was to last nearly 30 years, but it also led me into another 
collaboration with Feynman, this time on the subject of computation. IIere’s how 
it happened. In 1968, I was invited to give a talk at a workshop on semiconductor 
devices at Lake of the Ozarks. In those days, you could get everyone who was doing 
cutting-edge work in one room, so the workshops were where all the action was. 
I had been thinking about Gordon Moore’s question, and decided to make it the 
subject of my talk. As I prepared for this event, I began to have serious doubts 
about my sanity. My calculations were telling me that, contrary to all the current 
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lore in the field, we could scale down the technology such that everything got better: 
The circuits got more complex, they ran faster, and they took less power — WOW! 
That’s a violation of Murphy’s law that won’t quit! But the more I looked at the 
problem, the more I was convinced that the result was correct, so I went ahead 
and gave the talk, to hell with Murphy! That talk provoked considerable debate, 
and at the time most people didn’t believe the result. But by the time the next 
workshop rolled around, a number of other groups had worked through the problem 
for themselves, and we were pretty much all in agreement. The consequences of this 
result for modern information technology have, of course, been staggering. 


Back in 1959, Feynman had given a lecture entitled “There’s Plenty of Room at 
the Bottom,” in which he discussed how much smaller things can be made than we 
ordinarily imagine. That talk, which appears elsewhere in this volume, had made a 
big impression on me; | thought about it often, and it would sometimes come up in 
our discussions on the tunneling work. When I told him about the scaling law for 
electronic devices, Feynman got jazzed. He came to my seminars on the subject, 
and always raised a storm of good questions and comments. | was working with a 
graduate student, Bruce Hoeneisen, and by 1971 we had worked out the details of 
how transistors would look and work when they were a factor of 100 smaller in linear 
dimension than the limits set by the prevailing orthodoxy. Recently, I had occasion 
to revisit these questions, and to review the history of what has happened in the 
industry since those papers were published. I plotted our 1971 predictions alongside 
the real data; they have held up extremely well over 25 years, representing a factor 
of several thousand in density of integrated circuit components. That review also 
appears in this volume. 


Because of the scaling work, I became completely absorbed with how the expo- 
nential increase in complexity of integrated circuits would change the way that we 
think about computing. The viewpoint of the computer industry at the time was an 
outgrowth of the industrial revolution; it was based on what was then called “the 
economy of scale.” The thinking went this way. A 1000-horsepower engine cost only 
four times as much as a 100-horsepower engine. Therefore, the cost per horsepower 
became less as the engine was made larger. It was more cost effective to make a 
few large power plants than to make many small ones. Efficiency considerations 
favored the concentration of technology in a few large installations. The same was 
evidently true of computing. One company, IBM, was particularly successful fol- 
lowing this strategy. The “Computing Center” was the order of the day — a central 
concentration of huge machines, with some bureaucrat “in charge,” and plenty of 
people around to protect the machines from anyone who might want to use them. 
This model went well with the bureaucratic mindset of the time — a mindset that 
has not totally died out even today. 


But as I looked at the physics of the emerging technology, it didn’t work that 
way at all. The time required to move data was set by the velocity of light and 
related electromagnetic considerations, so it was far more effective to put whatever 
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computing was required where the data were located. Efficiency considerations thus 
favored the distribution of technology, rather than the concentration of technology 
The economics of information technology were the reverse of those of mechanical 
technology. I gave numerous talks on this topic and, at that time, what I had to 
say was contrary to what the industry wanted to hear. The story is best told in 
George Gilder’s book Microcosm. Feynman had started this line of thought already 
in his 1959 lecture, and we had a strong agreement on the general direction things 
were headed. He often came to my group meetings, and we had lively discussions 
on how to build a machine that would recognize fingerprints, how to organize many 
thousand little computers so they would be more efficient than one big computer, 
etc. Those discussions inevitably led us to wonder about the most distributed 
computer of all: the human brain. Years before, Feynman had dabbled in biology, 
and I had worked with Max Delbruck on the physics of the nerve membrane, so I 
knew a bit about nerve tissue. John Hopfield had delved much deeper than either 
Feynman or | had, and by 1982 he had a simple model — a caricature of how 
computation might go on in the brain. 


The three of us decided to offer a course jointly, called “Physics of Computation.” 
The first year Feynman was battling a bout with cancer, so John and I had to go it 
alone. We alternated lectures, looking at the topic from markedly different points 
of view. Once Feynman rejoined us, we had even more fun — three totally different 
streams of conciousness in one course. The three of us had a blast, and learned a lot 
from one another, but many of the students were completely mystified. After the 
third year, we decided, in deference to the students, that there was enough material 
for three courses, each with a more unified theme. Hopfield did “Neural Networks, 
Feynman did “Quantum Computing,” and I did “Neuromorphic Systems.” The 
material in the Feynman Lectures on Computation evolved during this period. 


There is a vast mythology about Feynman, much of which is misleading. He 
had a sensitive side that he didn’t show often. Over lunch one time, I told him how 
much he had meant to me in my student years, and how I would not have gone 
into science had it not been for his influence. He looked embarrassed, and abruptly 
changed the subject; but he heard me and that was what was important. In those 
days, physics was an openly combative subject — the one who blinked first lost the 
argument. Bohr had won his debate with Einstein that way, and the entire field 
adopted the style. Feynman learned the game well — he never blinked. For this 
reason, he would never tell you when he was working on something, but instead 
would spring it, preferably in front of an audience, after he had it all worked out. 
The only way that you could tell what he cared about was to notice what topics 
made him mad when you brought them up. 


If Feynman was stuck about something, he had a wonderful way of throwing 
up a smoke screen; we used to call it “proof by intimidation.” There is a good 
example in Vol. II of the Lectures on Physics, directly related to collective elec- 
trodynamics. Section 17-8 contains the following comment: “...we would expect 
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that corresponding to the mechanical momentum p = mv, whose rate of change is 
the applied force, there should be an analogous quantity equal to LI, whose rate 
of change is V. We have no right, of course, to say that LI is the real momentum 
of the circuit; in fact it isn’t. The whole circuit may be standing still and have 
no momentum.” Now this passage does not mean that Feynman was ignorant of 
the fact that the electrical current I is made up of moving electrons, that these 
moving electrons have momentum, and that the momentum of the electrons does 
not correspond to the whole circuit moving in space. But the relations are not as 
simple as we might expect, and they do not correspond in the most direct way to 
our expectations from classical mechanics. It is exactly this point that prevented 
me, over all these years, from seeing how to do electrodynamics without Maxwell’s 
equations. Feynman was perfectly aware that this was a sticking point, and he 
made sure that nobody asked any questions about it. There is a related comment 
in Vol. III, section 21-3: “It looks as though we have two suggestions for relations 
of velocity to momentum... The two possibilities differ by the vector potential. 
... One of them... is the momentum obtained by multiplying the mass by velocity. 
The other is a more mathematical, more abstract momentum...” 


When Feynman said that a concept was “more mathematical” or “more ab- 
stract,” he was not paying it a compliment! He had no use for theory devoid of 
physical content. In the Lectures on Gravitation, he says “If there is something very 
slightly wrong in our definition of the theories, then the full mathematical rigor may 
convert these errors into ridiculous conclusions.” He called that “carrying rigor to 
the point of rigor mortis.” At another point he is even more explicit: “...it is the 
facts that matter, and not the proofs. Physics can progress without the proofs, 
but we can’t go on without the facts... if the facts are right, then the proofs are 
a matter of playing around with the algebra correctly.” He opened a seminar one 
time with the statement “Einstein was a giant.” A hush fell over the audience. We 
all sat, expectantly, waiting for him to elaborate. Finally, he continued “His head 
was in the clouds, but his feet were on the ground.” We all chuckled, and again we 
waited. After another long silence, he concluded “But those of us who are not that 
tall have to choose!” Amid the laughter, you could see that, not only a good joke, 
but also a deep point had been made. 


Experiments are the ground on which physics must keep its feet — as Feynman 
knew well. When any of us had a new result, he was all ears. He would talk about it, 
ask questions, brainstorm. That was the only situation in which I have personally 
interacted with him without the combative behavior getting in the way. Down deep, 
he always wanted to do experiments himself. A hilarious account of how he was 
“cured” of this craving appears in Surely You’re Joking, Mr. Feynman. In the 
end, he had his wish. In 1986, he was asked to join the Rodgers commission to 
investigate the Challenger disaster. After talking to the technical people, who knew 
perfectly well what the problem was, and had tried to postpone the launch, he was 
able to devise an experiment that he carried out on national, prime-time TV. In true 
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Feynman style, he sprang it full-blown, with no warning! In his personal appendix 
to the commission report, he concluded, “For a successful technology, reality must 
take precedence over public relations, for Nature cannot be fooled.” The day after 
the report was released was Caltech graduation, and we marched together in the 
faculty procession. “Did you see the headline this morning?” he asked. “No,” I 
replied “what did it say?” “It said FEYNMAN ISSUES REPORT.” He paused, 
and then continued with great glee, “Not Caltech Professor Issues Report, not 
Commission Member Issues Report, but FEYNMAN ISSUES REPORT.” 
He was a household word, known and revered by all people everywhere who loved 
truth. His own public relations were all about reality, and were, therefore, OK. 


In 1987, one year later, his cancer came back with a vengeance, and he died in 
February, 1988. Al Hibbs, a former student, colleague, and friend of Feynman’s, 
organized a wake in grand style. Bongo drums, news clips, interviews, and tes- 
timonials. It was deeply moving — we celebrated the life of this man who had, 
over the years, come to symbolize, not just the spirit of Caltech, but the spirit of 
science itself. This man had engendered the most intense emotions I have ever felt 
— love, hate, admiration, anger, jealousy, and, above all, a longing to share and 
an intense frustration that he would not. As I walked away from Feynman’s wake, 
I felt intensely alone. He was the man who had taught me, not only what physics 
was, but also what science was all about: what it meant to really understand. He 
was the only person with whom I could have talked about doing electromagnetism 
using only the vector potential. He was the only one who would have understood 
why it was important. He was the one who could have related to this dream that I 
had carried for 25 years. This dream came direct from Feynman, from what he said, 
and from what he scrupulously avoided saying — from the crystal-clear insights he 
had, and from the topics that made him mad when I brought them up. But now 
he was gone. I would have to go it alone. I sobbed myself to sleep that night, but 
I never shared those feelings with anyone. I learned that from him too. 


In 1994, I was invited to give the keynote talk at the Physics of Computation 
conference. That invitation gave me the kickstart I needed to get going. By a year 
later, I had made enough progress to ask Caltech for a year relief from teaching, so 
I could concentrate on the new research. In June 1997, the six graduate students 
working in my lab all received their doctoral degrees, and, for the first time since 
I joined the faculty, I was a free man. I finished the first paper on Collective 
Electrodynamics, which is included here. 


In the end, science is all in how you look at things. Collective Electrodynamics 
is a way of looking at the way that electrons interact. It is a much simpler way 
than Maxwell’s, because it is based on experiments that tell us about the electrons 
directly. Maxwell had no access to these experiments. The sticking point I men- 
tioned earlier is resolved in this paper, in a way that Feynman would have liked. 
The paper is dedicated to him in the most sincere way I know: it opens with my 
favorite quotation, the quotation that defines, for me, what science is all about. In 
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his epilogue, he tells us his true motivation for giving the Lectures on Physics: “I 
wanted most to give you some appreciation of the wonderful world, and the physi- 
cist’s way of looking at it, which, I believe, is a major part of the true culture of 
modern times... Perhaps you will not only have some appreciation of this culture; 


it is even possible that you may want to join in the greatest adventure that the 
human mind has ever begun.” 


4 
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Carver A. Mead” 


Abstract 


Standard results of electromagnetic theory are derived from the direct interaction 
of macroscopic quantum systems; the only assumptions used are the Ejinstein- 
deBroglie relations, the discrete nature of charge, the Green’s function for the vec- 
tor potential, and the continuity of the wave function. No reference is needed to 
Maxwell’s equations or to traditional quantum formalism. Correspondence limits 
based on classical mechanics are shown to be inappropriate. 


“But the real glory of science is that we can find a way of thinking such 
that the law is evident.” — R. P. Feynman 


4.1 Foundations of Physics 


Much has transpired since the first two decades of this century, when the conceptual 
foundations for modern physics were put in place. At that time, macroscopic me- 
chanical systems were easily accessible and well understood. The nature of electrical 
phenomena was mysterious; experiments were difficult and their interpretation was 
murky. Today, quite the reverse is true. Electrical experiments of breathtaking clar- 
ity can be carried out, even in modestly equipped laboratories. Electronic apparatus 
pervade virtually every abode and workplace. Modern mechanical experiments rely 
heavily on electronic instrumentation. Yet, in spite of this reversal in the range of 
experience accessible to the average person, introductory treatments of physics still 
use classical mechanics as a starting point. 


Ernst Mach wrote (p.596 in ref. {1]), “The view that makes mechanics the basis 
of the remaining branches of physics, and explains all physical phenomena by me- 
chanical ideas, is in our judgement a prejudice ... The mechanical theory of Nature, 
is, undoubtedly, in a historical view, both intelligible and pardonable;, and it may 


also, for a time, have been of much value. But, upon the whole, it is an artificial 
conception.” 


Classical mechanics is indeed inappropriate as a starting point for physics be- 
cause it is not fundamental; rather, it is the limit of an incoherent aggregation of 
an enormous number of quantum elements. To make contact with the fundamental 


*Reproduced from Proc. Natl. Acad. Sci. USA Vol.94, pp. 6013-6018, June 1997 Physics 
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nature of matter, we must work in a coherent context where the quantum reality is 
preserved. 


R. P. Feynman wrote (p.15-8 in ref. [2]), “There are many changes in concepts 
that are important when we go from classical to quantum mechanics ... Instead of 
forces, we deal with the way interactions change the wavelengths of waves.” 


Even Maxwell’s equations have their roots in classical mechanics. They were 
conceived as a theory of the ether: They express relations between the magnetic 
field B and the electric field E, which are defined in terms of the classical force 
F = q(E + v x B) on a particle of charge g moving with velocity v. But it is the 
vector potential A, rather than the magnetic field B, that has a natural connection 
with the quantum nature of matter — as highlighted by Aharonov and Bohm [3]. 


Hamilton’s formulation of classical mechanics was — and remains — the starting 
point for the concepts underlying the quantum theory. The correspondence principle 
would have every quantum system approach the behavior of its classical-mechanics 
counterpart in the limit where the mechanical action involved is large compared 
with Planck’s constant. 


Although superconductivity was discovered in 1911, the recognition that super- 
conductors manifest quantum phenomena on a macroscopic scale [4] came too late 
to play a role in the formulation of quantum mechanics. Through modern experi- 
mental methods, however, superconducting structures give us direct access to the 
quantum nature of matter. The superconducting state is a coherent state formed 
by the collective interaction of a large fraction of the free electrons in a material. 
Its properties are dominated by known and controllable interactions within the col- 
lective ensemble. The dominant interaction is collective because the properties of 
each electron depend on the state of the entire ensemble, and it is electromagnetic 
because it couples to the charges of the electrons. Nowhere in natural phenomena 
do the basic laws of physics manifest themselves with more crystalline clarity. 


This paper is the first in a series in which we start at the simplest possible 
conceptual level, and derive as many conclusions as possible before moving to the 
next level of detail. In most cases, understanding the higher level will allow us to 
see why the assumptions of the level below were valid. In this stepwise fashion, 
we build up an increasingly comprehensive understanding of the subject, always 
keeping in view the assumptions required for any given result. We avoid introducing 
concepts that we must “unlearn” as we progress. We use as our starting point the 
magnetic interaction of macroscopic quantum systems through the vector and scalar 
potentials A and V, which are the true observable quantities. For clarity, the brief 
discussion given here is limited to situations where the currents and voltages vary 
slowly; the four-vector generalization of these relations not only removes this quasi- 
static limitation, but gives us electrostatics as well [5, 6]. 
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4.2 Model System 


Our model system is a loop of superconducting wire — the two ends of the loop being 
colocated in space and either insulated or shorted, depending on the experimental 
situation. Experimentally, the voltage V between the two ends of the loop is related 
to the current J flowing through the loop by 


LI= [va = 6 (4.1) 


Two quantities are defined by this relationship: ®, called the magnetic flux ', and 
L, called the inductance, which depends on the dimensions of the loop. 


Current is the flow of charge: I = dQ/dt. Each increment of charge dQ carries 
an energy increment dW = VdQ into the loop as it enters ?. The total energy W 
stored in the loop is thus 


Ww = [ vao= [ viae= Lf Sra i Lf tat = sLP? (4.2) 


If we reduce the voltage to zero by, for example, connecting the two ends of the 
loop to form a closed superconducting path, the current J will continue to flow 
indefinitely: a persistent current. If we open the loop and allow it to do work on 
an external circuit, we can recover all the energy W. 


If we examine closely the values of currents under a variety of conditions, we 
find the full continuum of values for the quantities J, V, and ®, except for persistent 
currents, where only certain discrete values occur for any given loop [7, 8]. By ex- 
perimenting with loops of different dimensions, we find the condition that describes 
the values that occur experimentally: 


b= / Vdt = n®> (4.3) 


Here, n is any integer, and @p = 2.06783461 x 10~?° volt-second is called the flux 
quantum or fluxoid; its value is accurate to a few parts in 10°, independent of 
the detailed size, shape, or composition of the superconductor forming the loop. 
We also find experimentally that a rather large energy — sufficient to disrupt the 
superconducting state entirely — is required to change the value of n. 


The more we reflect on Eq. 4.3, the more remarkable the result appears. The 
quantities involved are the voltage and the magnetic flux. These quantities are 
integrals of the quantities E and B that appear in Maxwell’s equations, and are 
therefore usually associated with the electromagnetic field. Experimentally, we 


1This definition is independent of the shape of the loops, and applies to coils with multiple 
turns. For multiturn coils, what we call the flux is commonly referred to as the total flux linkage. 
2We use this relation to define the voltage V. 
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know that they can take on a continuum of values — except under special conditions, 
when the arrangement of matter in the vicinity causes the flux to take on precisely 
quantized values. In Maxwell’s theory, E and B represented the state of strain 
in a mechanical medium (the ether) induced by electric charge. Einstein had a 
markedly different view (p.383 in ref. [9]): “I feel that it is a delusion to think of 
the electrons and the fields as two physically different, independent entities. Since 
neither can exist without the other, there is only one reality to be described, which 
happens to have two different aspects; and the theory ought to recognize this from 
the start instead of doing things twice.” At the most fundamental level, the essence 
of quantum mechanics lies in the wave nature of matter. Einstein’s view would 
suggest that electromagnetic variables are related to the wave properties of the 
electrons. Quantization is a familiar phenomenon in systems where the boundary 
conditions give rise to standing waves. The quantization of flux (Eq. 4.3) is a direct 
manifestation of the wave nature of m:.tter, expressed in electromagnetic variables. 


4.3 Matter 


To most nonspecialists, quantum mechanics is a baffling mixture of waves, statistics, 
and arbitrary rules, ossified in a matrix of impenetrable formalism. By using a 
superconductor, we can avoid the statistics, the rules, and the formalism, and work 
directly with the waves. The wave concept, accessible to intuition and common 
sense, gives us “a way of thinking such that the law is evident.” Electrons in a 
superconductor are described by a wave function that has an amplitude and a phase. 
The earliest treatment of the wave nature of matter was the 1923 wave mechanics of 
deBroglie. He applied the 1905 Einstein postulate (W = fiw) to the energy W of an 
electron wave, and identified the momentum p of an electron with the propagation 
vector of the wave: p = hk. Planck’s constant / and its radian equivalent i = h/27 
are necessary for merely historical reasons — when our standard units were defined, 
it was not known that energy and frequency were the same quantity. 


The Einstein-deBroglie relations apply to the collective electrons in a supercon- 
ductor. The dynamics of the system can be derived from the dispersion relation [10] 
between w and k. Both w and k are properties of the phase of the wave function and 
do not involve the amplitude, which, in collective systems, is usually determined 
by some normalization condition. In a superconductor, the constraint of charge 
neutrality is such a condition. 


The wave function must be continuous in space; at any given time, we can follow 
the phase along a path from one end of the loop to the other: The number of radians 
by which the phase advances as we traverse the path is the phase accumulation ~ 
around the loop. If the phase at one end of the loop changes relative to that at the 
other end, that change must be reflected in the total phase accumulation around 
the loop. The frequency w of the wave function at any point in space is the rate 
at which the phase advances per unit time. If the frequency at one end of the loop 
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(w,) is the same as that at the other end (w.), the phase difference between the 
two ends will remain constant, and the phase accumulation will not change with 
time. If the frequency at one end of the loop is higher than that at the other, the 
phase accumulation will increase with time, and that change must be reflected in 
the rate at which phase accumulates with the distance | along the path. The rate at 
which phase around the loop accumulates with time is the difference in frequency 
between the two ends. The rate at which phase accumulates with distance / is the 
component of the propagation vector k in the direction dl along the path. Thus, 
the total phase accumulated around the loop is 


p= [wu ~ap)dt = p kdl (4.4) 


We can understand quantization as an expression of the single-valued nature of the 
phase of the wave function. When the two ends of the loop were connected to an 
external circuit, the two phases could evolve independently. When the ends are 
connected to each other, however, the two phases must match up. But the phase 
is a quantity that has a cyclic nature — matching up means being equal modulo 
27. Thus, for a wave that is confined to a closed loop, and has a single-valued. 
continuous phase, the integral of Eq. 4.4 must be n27, where n is an integer. The 
large energy required to change n is evidence that the phase constraint is a strong 
one — as long as the superconducting state stays intact, the wave function remains 
intact as well. 


These relations tell us that the magnetic flux and the propagation vector will 
be quantized for a given loop; they do not tell us how the frequency w in Eq. 4.4 is 
related to the potential V in Eq. 4.1. To make this connection, we must introduce 
one additional assumption: The collective electron system represented by the wave 
function is made up of elemental charges of magnitude go. By the Einstein relation, 
the energy goV of an elemental charge corresponds to a frequency w = goV//ht. 


4.4 Electrodynamics 


Electrodynamics is the interaction of matter via the electromagnetic field. We can 
formulate our first relation between the electromagnetic quantities V and ® and the 
phase accumulation y of the wave function by comparing Eq. 4.1 with Eq. 4.4: 


y= [ wae = 7 Vdt = FnBo = n(27) (4.5) 
From Eq. 4.5, we conclude that $9 = h/gqo. We understand that the potential V 
and the frequency w refer to differences in these quantities between the two ends of 
the loop. Equivalently, we measure each of these quantities at one end of the loop 
using as a reference the value at the other end of the loop. When we substitute into 
Eq. 4.5 the measured value of ®p and the known value of h, we obtain for go a value 
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that is exactly twice the charge q, of the free electron. The usual explanation for this 
somewhat surprising result is that each state in the superconductor is occupied by 
a pair of electrons, rather than by an individual electron, so the elemental charge qo 
should be 2q,-, rather than qg,. None of the conclusions that we shall reach depends 
on the value of gp. 


We have established the correspondence between the potential V and the fre- 
quency w — the time integral of each of these equivalent quantities in a closed 
loop is quantized. The line integral of the propagation vector k around a closed 
loop also is quantized. We would therefore suspect the existence of a corresponding 
electromagnetic quantity, whose line integral is the magnetic flux ®. That quantity 
is the well-known vector potential A. The general relations among these quantities, 
whether or not the loop is closed, are 


ee _ fi (46) 
Flux@=fVdt=fA.dl fo o> 


Eq. 4.6 expresses the first set of fundamental relations of collective electrodynamics. 


4.5 Coupling 


Up to this point, we have tentatively identified the phase accumulation and the 
magnetic flux as two representations of the same physical entity. We assume that 
“winding up” the wave function with a voltage produces a propagation vector in the 
superconductor related to the motion of the electrons, and that this motion corre- 
sponds to a current because the electrons are charged. This viewpoint will allow us 
to understand the interaction between two coupled collective electron systems. We 
shall develop these relations in more detail when we study the current distribution 
within the wire itself. 


Let us consider two identical loops of superconducting wire, the diameter of the 
wire being much smaller than the loop radius. We place an extremely thin insulator 
between the loops, which are superimposed on each other as closely as allowed by 
the insulator. In this configuration, both loops can be described, to an excellent 
approximation, by the same path in space, despite their being electrically distinct. 
As we experiment with this configuration, we make the following observations. 


(i) When the two ends of the second loop are left open, its presence has no effect 
on the operation of the first loop. The relationship between a current flowing in 
the first loop and the voltage observed between the ends of the first loop follows 
Eq. 4.1 with exactly the same value of LE as that observed when the second loop 
was absent. 


(iz) The voltage observed between the two ends of the second loop under open 
conditions is almost exactly equal to that observed across the first loop. 


COLLECTIVE ELECTRODYNAMICS I] 30 


(ii) When the second loop is shorted, the voltage observed across the first loop 
is nearly zero, independent of the current. 


(iv) The current observed in the second loop under shorted conditions is nearly 
equal to that flowing in the first loop, but is of the opposite sign. 


Similar measurements performed when the loops are separated allow us to ob- 
serve how the coupling between the loops depends on their separation and relative 
orientation. 


(v) For a given configuration, the voltage observed across the second loop re- 
mains proportional to the voltage across the first loop. The constant of proportion- 
ality, which is nearly unity when the loops are superimposed, decreases with the 
distance between the loops. 


(vi) The constant of proportionality decreases as the axes of the two loops are 
inclined with respect to each other, goes to zero when the two loops are orthogonal, 
and reverses when one loop is flipped with respect to the other. 


Observation 7 tells us that the presence of electrons in the second loop does not 
per se affect the operation of the first loop. The voltage across a loop is a direct 
manifestation of the phase accumulation around the loop. Observation 77 tells us 
that current in a neighboring loop is as effective in producing phase accumulation 
in the wave function as is current in the same loop. The ability of current in 
one location to produce phase accumulation in the wave function of electrons in 
another location is called magnetic interaction. Observation vi tells us that the 
magnetic interaction is vectorial in nature. After making these and other similar 
measurements on many configurations, involving loops of different sizes and shapes, 
we arrive at the proper generalization of Eqs. 4.1 and 4.6: 


[Vdt=¢A. dh = %, =1,,+MIh 47 

J Vadt = ¢A.dlp = $2 = MI, + Lol (4.7) 
Here, the line elements dl, and dl are taken along the first and second loops, 
respectively. The quantity M, which by observation vi can be positive or negative 
depending on the configuration, is called the mutual inductance; it is a measure of 
how effective the current in one loop is at causing phase accumulation in the other. 
When L; = Ly = L, the magnitude of M can never exceed L. Observations 2 — iv 
were obtained under conditions where M@ = L. Experiments evaluating the mutual 
coupling of loops of different sizes, shapes, orientations, and spacings indicate that 
each element of wire of length dl carrying the current I makes a contribution to 
A that is proportional to I, and to the inverse of the distance r from the current 
element to the point at which A is evaluated: 


I J 
A= B ftasaq [a0 (4.8) 
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The constant po is called the permeability of free space. The second form follows 
from the first if we visualize a distribution of current as carried by a large number of 
wires of infinitesimal cross section, and the current density J as being the number of 
such wires per unit area normal to the current flow. The 1/r form of the integrand 
of Eq. 4.8 is called the Green’s function; it tells us how the vector potential is 
generated by currents everywhere in space. It is perhaps more correct to say that 
the vector potential is a bookkeeping device for evaluating the effect at a particular 
point of all currents everywhere in space. Ernst Mach wrote (p.317 in ref. [1]), “We 
cannot regard it as impossible that integral laws ... will some day take the place of 
the ... differential laws that now make up the science of mechanics ... In such an 
event, the concept of force will have become superfluous.” Eqs. 4.6 and 4.8 are the 
fundamental integral laws for collective electromagnetic interaction. The equivalent 
differential equation is V2A = —pUoJ (5, 6]. 


We can express Eq. 4.2 in a way that gives us additional insight into the energy 
stored in the coil: 


w= [vaq= [via= [| 1a8 (4.9) 


Eq. 4.9 is valid for any A; it is not limited to the A from the current in the coil itself. 
The integrals in Eq. 4.9 involve the entire coil. From them we can take a conceptual 
step and, using our visualization of the current density, imagine an energy density 
J.A ascribed to every point in space: 


w= [t.Adl= [ 5. Aavo (4.10) 


4.6 Electrodynamic Momentum 


Feynman commented on the irrelevance of the concept of force in a quantum context. 
At the fundamental level, we can understand the behavior of a quantum system 
using only the wave properties of matter. But we experience forces between currents 
in every encounter with electric motors, relays, and other electromagnetic actuators. 
How do these forces arise from the underlying quantum reality? We can make a 
connection between the classical concept of force and the quantum nature of matter 
through the concept of momentum. Using the deBroglie postulate relating the 
momentum p of an electron to the propagation vector k of the wave function, and 
identifying the two integrands in Eq. 4.6, the electrodynamic momentum of an 
elemental charge is 


p=fk=qA (4.11) 


We shall now investigate the electrodynamic momentum in one of our loops of 
superconducting wire. There is an electric field EH along the loop, the line integral 
of which is the voltage V between the ends. From a classical point of view, Newton’s 
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law tells us that the force gg / ona charge should be equal to the time rate of change 
of momentum. From Eq. 4.11, 


WE = P= qr eV = gE.dl= 5 (4.12) 
Integrating the second form of Eq. 4.12 with respect to time, we recover Eq. 4.6, 
so the classical idea of inertia is indeed consistent with the quantum behavior of 
our collective system. Electrodynamic inertia acts exactly as a classical mechanical 
inertia: It relates the integral of a force to a momentum, which is manifest as a 
current. We note that, for any system of charges that is overall charge neutral, as 
is our superconductor, the net electromagnetic momentum is zero. For the —¢A of 
each electron, we have a canceling +gA from one of the background positive charges. 
The electric field that accelerates electrons in one direction exerts an equal force in 
the opposite direction on the background positive charges. We have, however, just 
encountered our first big surprise: We recognize the second form of Eq. 4.12, which 
came from Newton's law, as the integral form of one of Maxwell’s equations! 


We would expect the total momentum P of the collective electron system to be 
the momentum per charge times the number of charges in the loop. If there are 7 
charges per unit length of wire that take part in the motion, integrating Eq. 4.11 
along the loop gives 


P=n90 ga «dl = ngo® = ngoLl (4.13) 


The current J is carried by the 7 charges per unit length moving at velocity v; 
therefore, J = nqov, and Eq. 4.13 becomes 


P = L(nqo)*v (4.14) 


The momentum is proportional to the velocity, as it should be. It is also proportional 
to the size of the loop, as reflected by the inductance L. Here we have our second 
big surprise: instead of scaling linearly with the number of charges that take part 
in the motion, the momentum of a collective system scales as the square of the 
number of charges! We can understand this collective behavior as follows. In 
an arrangement where charges are constrained to move in concert, each charge 
produces phase accumulation, not only for itself, but for all the other charges as 
well. So the inertia of each charge increases linearly with the number of charges 
moving in concert. The inertia of the ensemble of coupled charges must therefore 
increase as the square of the number of charges. 


4.7 Forces on Currents 


In our experiments on coupled loops, we have already seen how the current in 
one loop induces phase accumulation in another loop; the relations involved were 
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captured in Eq. 4.7. In any situation where we change the coupling of collective 
systems by changing the spatial arrangement, mechanical work may be involved. 
Our model system for studying this interaction consists of two identical shorted 
loops of individual inductance Lo, each carrying a persistent flux ®. As long as 
the superconducting state retains its integrity, the cyclic constraint on the wave 
function guarantees that the flux ® in each loop will be constant, independent of 
the coupling between loops. Because M enters symmetrically in Kq. 4.7, the current 
I will be the same in both loops. Hence, Lo and ® will remain constant, whereas M 
and J will be functions of the spatial arrangement of the loops — M will be large 
and positive when the loops are brought together with their currents flowing in the 
same direction, and will be large and negative when the loops are brought together 
with their currents flowing in opposite directions. From Eq. 4.7, ® = (fp + M)I. 
Substituting ® into Eq. 4.9, and noting that the total energy of the system is twice 
that for a single coil, 


2 


mam (4.15) 


W =2 | 16 = (Ip + M)I? = 


The force F, along some direction z is defined as the rate of change of energy with 
a change in the corresponding coordinate: 


ow 6 \*dmM 
a= Geo (em) a 


The negative sign indicates an attractive force because the mutual inductance M 
increases as the coils — whose currents are circulating in the same direction — 
are moved closer. It is well known that electric charges of the same sign repel 
each other. We might expect the current, being the spatial analog of the charge, to 
behave in a similar manner. However, Eq. 4.15 indicates that the total energy of the 
system decreases as M increases. How does this attractive interaction of currents 
circulating in the same direction come about? 


The electron velocity is proportional to J. As M is increased, the electrons 
in both loops slow down because they have more inertia due to the coupling with 
electrons in the other loop. This effect is evident in Eq. 4.15, where [J = ®/(Lo+M). 
Thus, there are two competing effects: The decrease in energy due to the lower 
velocity, and the increase in energy due to the increase in inertia of each electron. 
The energy goes as the square of the velocity, but goes only linearly with the inertia, 
so the velocity wins. The net effect is a decrease in energy as currents in the same 
direction are coupled, and hence an attractive force. We can see how the classical 
force law discovered in 1823 by Ampere arises naturally from the collective quantum 
behavior, which determines not only the magnitude, but also the sign, of the effect. 
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4.8 Multiturn Coils 


The interaction in a collective system scales as the square of the number of elec- 
trons moving in concert. Thus, we might expect the quantum scaling laws to be 
most clearly manifest in the properties of closely coupled multiturn coils, where 
the number of electrons is proportional to the number of turns. We can construct 
an N-turn coil by connecting in series N identical, closely coupled loops. In this 
arrangement, the current through all loops is equal to the current J through the 
coil, and the voltage V across the coil is equal to the sum of the individual voltages 
across the loops. If Ag is the vector potential from the current in one loop, we 
expect the vector potential from N loops to be N Ag, because the current in each 
loop contributes. The flux integral is taken around N turns, so the path is NV times 
the length lp of a single turn. The total flux integral is thus 


Nlo 
© = / Vat = NAg.dl = N?Lol (4.17) 
0 


From Eq. 4.17 we conclude that an N-turn closely coupled coil has an inductance 
L = N?Lpo Once again, we see the collective interaction scaling as the square of 
the number of interacting charges. We remarked that collective quantum systems 
have a correspondence limit markedly different from that of classical mechanical 
systems. When two classical massive bodies, each body having a separate inertia, 
are bolted together, the inertia of the resulting composite body is simply the sum 
of the two individual inertias. The inertia of a collective system, however, is a 
manifestation of the interaction, and cannot be assigned to the elements separately. 
This difference between classical and quantum systems has nothing to do with the 
size scale of the system. Eq. 4.17 is valid for large as well as for small systems; 
it is valid where the total phase accumulation is an arbitrary number of cycles — 
where the granularity of the flux due to fis as small as might be required by any 
correspondence procedure. Thus, it is clear that collective quantum systems do not 
have a classical correspondence limit. 


4.9 Total Momentum 


To see why our simplistic approach has taken us so far, we must understand the 
current distribution within the superconductor itself. We saw that the vector po- 
tential made a contribution to the momentum of each electron, which we called the 
electrodynamic momentum: pe, = gA. The mass m of an electron moving with 
velocity v also contributes to the electron’s momentum: pmy = mv. The total 
momentum is the sum of these two contributions: 


hk = p= pei+ Pmv = goA + mv (4.18) 


The velocity v = (hk — go A)/m is thus a direct measure of the imbalance between 
the total momentum Ak and the electrodynamic momentum ggA. When these two 
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quatitities are matched, the velocity is zero. The current density is just the motion 
of V elementary charges per unit volume: J = ggMNv. We can thus express Eq. 4.18 
in terms of the wave vector k, the vector potential A, and the current density J: 


N 
J= a (tik ~ qo A) (4.19) 


4.10 Current Distribution 


We are now in a position to investigate how current distributes itself inside a su- 
perconductor. If A were constant throughout the wire, the motion of the electrons 
would be determined by the common wave vector k of the collective electron sys- 
tem, and we would expect the persistent current for a given flux to be proportional 
to the cross-sectional area of the wire, and thus the inductance L of a loop of wire 
to be inversely related to the wire cross section. When we perform experiments on 
loops of wire that have identical paths in space, however, we find that the induc- 
tance is only a weak function of the wire diameter, indicating that the current is 
not uniform across the wire, and therefore that A is far from constant. If we make 
a loop of superconducting tubing, instead of wire, we find that it has exactly the 
same inductance as does a loop made with wire of the same diameter, indicating 


that current is flowing at the surface of the loop, but is not flowing throughout the 
bulk. 


Before taking on the distribution of current in a wire, we can examine a simpler 
example. In a simply connected bulk superconductor, the single-valued nature of 
the wave function can be satisfied only if the phase is everywhere the same: k = 0. 
Any phase accumulation induced through the A vector created by an external 
current will be canceled by a screening current density J in the opposite direction, 
aS we saw in observations 172 and iv. To make the problem tractable, we consider 
a situation where a vector potential Ap at the surface of a bulk superconducting 
slab is created by distant currents parallel to the surface of the slab. The current 
distribution perpendicular to the surface is a highly localized phenomenon, so it 
is most convenient to use the differential formulation of Eq. 4.8. We suppose that 
conditions are the same at all points on the surface, and therefore that A changes in 
only the x direction, perpendicular to the surface, implying that VA = 07A/0z?. 
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The solution to Eq. 4.20 is 
A = Age*/* 2? = — (4.21) 
° moa 


The particular form of Eq. 4.21 depends on the geometry, but the qualitative result 
is always the same, and can be understood as follows: The current is the imbalance 
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between the wave vector and the vector potential. When an imbalance exists, a 
current proportional to that imbalance will flow such that it cancels out the im- 
balance. The resulting screening current dies out exponentially with distance from 
the source of imbalance. The distance scale at which the decay occurs is given by 
A, the screening distance, penetration depth, or skin depth. For a typical super- 
conductor, NV is of the order of 10?8/m?, so \ should be a few tens of nanometers. 
Experimentally, simple superconductors have \ = 50 nanometers — many orders 
of magnitude smaller than the macroscopic wire thickness that we are using. 


4.11 Current in a Wire 


At long last, we can visualize the current distribution within the superconducting 
wire itself. Because the skin depth is so small, the surface of the wire appears flat 
on that scale, and we can use the solution for a flat surface. The current will be a 
maximum at the surface of the wire, and will die off exponentially with distance into 
the interior of the wire. We can appreciate the relations involved by examining a 
simple example. A 10-cm-diameter loop of 0.1-mm-diameter wire has an inductance 
of 4.4 x 1077 Henry (p.193 in ref. [11]): A persistent current of 1 Ampere in this 
loop produces a flux of 4.4 x 107? volt-second, which is 2.1 x 10° flux quanta. The 
electron wave function thus has a total phase accumulation of 2.1 x 10° cycles along 
the length of the wire, corresponding to a wave vector k = 4.25 x 10° m~!. Due to 
the cyclic constraint on the wave function, this phase accumulation is shared by all 
electrons in the wire, whether or not they are carrying current. 


In the region where current is flowing, the moving mass of the electrons con- 
tributes to the total phase accumulation. The 1-Ampere of current results from a 
current density of 6.4 x 10!° Amperes per square meter flowing in a thin “skin” 
~ , just inside the surface. This current density is the result of the 1078 electrons 
per cubic meter moving with a velocity of v ~ 20 meters per second. The mass 
of the electron moving at this velocity contributes mv/fh = 1.7 x 10°m=! to the 
total wave vector of the wave function, which is less than one part in 104 of that 
contributed by the vector potential. That small difference, existing in about 1 part 
in 10° of the cross-sectional area, is enough to bring k and A into balance in the 
interior of the wire. 


In the interior of the wire, the propagation vector of the wave function is matched 
to the vector potential, and the current is therefore zero. As we approach the 
surface, A decreases slightly, and the difference between k and Ago/f is manifest 
as a current. At the surface, the value and radial slope of A inside and outside 
the wire match, and the value of A is still within one part in 10* of that in the 
center of the wire. So our simplistic view — that the vector potential and the wave 
vector were two representations of the same quantity — is precisely true in the 
center of the wire, and is nearly true even at the surface. The current I is not the 
propagation vector k of the wave, but, for a fixed configuration, I is proportional to 
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k by Eqs. 4.8 and 4.19. For that reason, we were able to deduce the electromagnetic 
laws relating current and voltage from the quantum relations between wave vector 
and frequency. 


4.12 Conclusion 


We took to heart Einstein’s belief that the electrons and the fields were two as- 
pects of the same reality, and were able to treat the macroscopic quantum system 
and the electromagnetic field as elements of a unified subject. We heeded Mach’s 
advice that classical mechanics was not the place to start, followed Feynman’s di- 
rective that interactions change the wavelengths of waves, and saw that there is a 
correspondence limit more appropriate than the classical-mechanics version used in 
traditional introductions to quantum theory. We found Newton’s law masquerad- 
ing as one of Maxwell’s equations. We were able to derive a number of important 
results using only the simplest properties of waves, the Einstein postulate relating 
frequency to energy, the deBroglie postulate relating momentum to wave vector, 
and the discrete charge of the electron. It thus appears possible to formulate a uni- 
fied, conceptually correct introduction to both the quantum nature of matter and 
the fundamental laws of electromagnetic interaction without using either Maxwell’s 
equations or standard quantum formalism. 


I am indebted to Richard F. Lyon, Sanjoy Mahajan, William B. Bridges, Rahul 
Sarpeshkar, Richard Neville, and Lyn Dupre for helpful discussion and critique of the 
material, and to Calvin Jackson for his help in preparing the manuscript. The work was 
supported by the Arnold and Mabel Beckman Foundation, and by Gordon and Betty 
Moore. 
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A MEMORY 


Gerald Jay Sussman 


I met and worked with Richard P. Feynman when I was on sabbatical leave 
from MIT for the year 1983-1984. At MIT I had worked in various aspects of 
Computer Science, including Artificial Intelligence, Computer Architecture, and 
High-Level Programming Languages. Hal Abelson and I had been developing an 
exciting and novel introductory subject for our Department entitled “Structure and 
Interpretation of Computer Programs.” In fact, we had just produced a first draft 
of our book by that name as an MIT AI Laboratory Technical Report. This report 
came to be called the “Yellow Wizard Book”, because of the picture on its yellow 
cover. Richard’s son, Carl, had been my student at MIT. 


I went to Caltech to learn about dynamical astrophysics, a perennial interest of 
mine. My friend and former teacher, Professor Alar Toomre of the MIT Department 
of Mathematics, had arranged that I would work with Peter Goldreich’s group for 
that year. My wife, Julie, and I went to Caltech in the Spring to look into the 
housing situation for our stay. Mr. Feynman met us for lunch at the Athenaeum 
(the Caltech Faculty Club) . Apparently he had read the yellow wizard book and 
liked it — he said that he learned a lot from reading it. At some point Feynman, 
in his characteristic gruff voice, said “My son says youre a pretty good teacher. I 
am going to teach a course on computing — on the ultimate physical limitations 
of computation, and on the computational aspects of physics. Will you help me?” 
Now, I had been planning on learning about astrophysics for that year, so I was 
a bit leery of taking on a teaching assignment, but the prospect of working with 
Feynman was difficult to resist. I asked him when the course met and he said that 
it was on Monday, Wednesday and Friday, at 11:00 to 12:00, for the entire year. I 
replied that I would help with the subject if he would eat lunch with me afterwards. 
He agreed, and that was one of the best deals I ever made in my life. 


So, during the 1983-1984 school year I helped Richard Feynman develop and 
teach a subject entitled “Potentialities and Limitations of Computing Machines” 
at Caltech. Our class consisted mostly of graduate students in physics and in com- 
puter science. We had a lot of fun. We taught students how to program, in Scheme, 
using the Yellow Wizard Book. I provided a computing laboratory component for 
the subject, using a Caltech computer. Richard really loved to program: he stayed 
up one night with me learning how to write power-series manipulations using in- 
finite streams. We taught a bit about hardware design using MOS circuits. We 
investigated the limitations of size for circuits based on lithography and methods 
such as ion implantation and diffusion. We estimated how much variation we could 
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expect in the parameters of an inverter as the transistors became smaller. We told 
students a bit of information theory (up to Shannon’s theorem) and about error- 
correcting codes. Richard really liked the recursive application of these ideas. We 
examined the thermodynamic arguments and we discussed algorithmic information. 
We looked at the ideas of Landauer, Fredkin, Toffoli, Margolus, and Bennett, con- 
cerning the reversibility of computation and what that could mean physically. From 
here we were led to cellular automata. We discussed classic examples. We examined 
how such systems could be universal and in what sense they could be models for 
the physical world. 


We also discussed what was then known about the quantum mechanics of com- 
putation. Richard explained his work on the computational difficulty of simulating 
a quantum-mechanical system with a classical computer. He also gave a few lectures 
in which he developed a quantum-mechanical state machine, composed of a PLA 
(programmed logic array) built up from operators, and a state register described 
in terms of spin states. We estimated the cycle time that could be expected from 
diffusion of the state, and how energy could be used to speed things up. 


While this class was under way, I assembled a group to help me build the Digital 
Orrery, a special-purpose computer designed to do high-precision integrations for 
orbital-mechanics experiments. Using the Digital Orrery, I later worked with Jack 
Wisdom to discover numerical evidence for chaotic motions in the outer planets. 
The Digital Orrery is now retired at the Smithsonian Institution in Washington, 
D.C. One serious problem with long integrations is the buildup of numerical error. 
Before beginning the design I assumed that the software would be easy, and that the 
complete numerical analysis for such a venerable problem would be in the textbooks. 
Although he did not actually contribute to the design or to the development of the 
algorithms we ultimately used, Richard put out significant effort to help me to 
understand this problem, just how bad it was, and how little was known about its 
solution. 


My last encounter with Richard Feynman was not long before he died. Richard 
and Carl ate dinner with Julie and me at our house. We spent the rest of the evening 
practicing lock picking with locks from my rather extensive collection. I still sorely 
miss my friend, Richard Feynman. When thinking about interesting ideas, from 


either physics or computation, I often ask myself, “What would Feynman think of 
this?” 
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NUMERICAL EVIDENCE THAT THE MOTION 
OF PLUTO IS CHAOTIC 


Gerald Jay Sussman and Jack Wisdom * 


Abstract 


The Digital Orrery has been used to perform an integration of the motion of the 
outer planets for 845 million years. This integration indicates that the long-term 
motion of the planet Pluto is chaotic. Nearby trajectories diverge exponentially 
with an e-folding time of only about 20 million years. 


6.1 Introduction 


The determination of the stability of the solar system is one of the oldest problems 
in dynamical astronomy but despite considerable attention all attempts to prove 
the stability of the system have failed. Arnold has shown that a large proportion 
of possible solar systems are quasiperiodic if the masses, and orbital eccentricities 
and inclinations, of the planets are sufficiently small {1]. The actual solar system, 
however, does not meet the stringent requirements of the proof. Certainly, the 
great age of the solar system suggests a high level of stability, but the nature of 
the long-term motion remains undetermined. The apparent analytical complexity 
of the problem has led us to investigate the stability by means of numerical models. 
We have investigated the long-term stability of the solar system through an 845- 
million-year numerical integration of the five outermost planets with the Digital 
Orrery [2], a special-purpose computer for studying planetary, motion. 


Pluto’s orbit is unique among the planets. It is both highly eccentric (e = 0.25) 
and highly inclined (i ~ 16°). The orbits of Pluto and Neptune cross one another, 
a condition permitted only by the libration of a resonant argument associated with 
the 3:2 commensurability between the orbital periods of Pluto and Neptune. This 
resonance, which has a libration period near 20,000 years [3], ensures that Pluto 
is far from perihelion when Pluto and Neptune are in conjunction. Pluto also 
participates in a resonance involving its argument of perihelion, the angle between 
the ascending node and the perihelion, which librates about 7 /2 with a period of 3.8 
million years [4]. This resonance guarantees that the perihelion of Pluto’s orbit is 
far from the line of intersection of the orbital planes of Pluto and Neptune, further 
ensuring that close encounters are avoided. 


*Reprinted from Science, Vol.241, Research Articles, pp. 433-437, 22 July 1988 


48 GERALD JAY SUSSMAN 


We found in our 200-million-year integrations of the outer planets [5] that Pluto’s 
orbit also undergoes significant variations on much longer time scales. The libration 
of the argument of perihelion is modulated with a period of 34 million years, and 
h =e sin w, where e is the eccentricity and w is the longitude of perihelion, shows 
significant long-period variations with a period of 137 million years. The appear- 
ance of the new 34-million-year period might have been expected, because Pluto 
must have two independent long-period frequencies, but the 137-million-year period 
was completely unexpected. It results from a near commensurability between the 
frequency of circulation of Pluto’s ascending node and one of the principal secular 
frequencies of the massive planets. Pluto also participates in two other resonances 
involving the frequency of oscillation of the argument of perihelion and the principal 
secular frequencies. In our 200-million-year integration Pluto’s inclination appeared 
to have even longer periods or possibly a secular decrease. 


The similarity of Pluto’s peculiar highly eccentric and inclined orbit to chaotic 
asteroid orbits [6], together with the very long periods, Pluto’s participation in a 
large number of resonances, and the possible secular decline in inclination compelled 
us to carry out longer integrations of the outer planets to clarify the nature of the 
long-term evolution of Pluto. Our new numerical integration indicates that in fact 
the motion of the planet Pluto is chaotic. 


6.2 Deterministic chaotic behavior 


In most conservative dynamical systems Newton’s equations have both regular solu- 
tions and chaotic solutions. For some initial conditions the motion is quasiperiodic; 
for others the motion is chaotic. Chaotic behavior is distinguished from quasiperi- 
odic behavior by the way in which nearby trajectories diverge (6, 7]. Nearby 
quasiperiodic trajectories diverge linearly with time, on average, whereas nearby 
chaotic trajectories diverge exponentially with time. Quasiperiodic motion can be 
reduced to motion on a multidimensional torus; the frequency spectrum of quasiperi- 
odic motion has as many independent frequencies as degrees of freedom. The fre- 
quency spectrum of chaotic motion is more complicated, usually appearing to have 
a broad-band component. 


The Lyapunov exponents measure the average rates of exponential divergence 
of nearby orbits. The Lyapunov exponents are limits for large time of the quantity 
= In(d/do)/(t — to) where d is the distance in phase space between the trajectory 
and an infinitesimally nearby test trajectory, and ¢ is the time. For any particular 
trajectory of an n-dimensional system there can be n distinct Lyapunov exponents, 
depending on the phase-space direction from the reference trajectory to the test 
trajectory. In Hamiltonian systems the Lyapunov exponents are paired; for each 
non-negative exponent there is a non-positive exponent with equal magnitude. Thus 
an m-degree-of-freedom Hamiltonian system can have at most m positive exponents. 
For chaotic trajectories the largest Lyapunov exponent is positive; for quasiperiodic 
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Fig. 6.1. The exponential divergence of nearby trajectories is indicated by the average 
linear growth of the logarithms of the distance measures as a function of time. In the 
upper trace we see the growth of the variational distance around a reference trajectory 
(left vertical axis). In the lower trace we see how two Plutos diverge with time (right 
vertical axis). The distance saturates near 45 AU; note that the semimajor axis of Pluto’s 
orbit is about 40 AU. The variational method of studying neighboring trajectories does 
not have the problem of saturation. Note that the two methods are in excellent agreement 
until the two-trajectory method has nearly saturated. 


trajectories all of the Lyapunov exponents are zero. 


Lyapunov exponents can be estimated from the time evolution of the phase- 
space distance between a reference trajectory and nearby test trajectories (7, 8]. 
The most straightforward approach is to simply follow the trajectories of a small 
cloud of particles started with nearly the same initial conditions. With a sufficiently 
long integration we can determine if the distances between the particles in the cloud 
diverge exponentially or linearly. If the divergence is exponential, then for each pair 
of particles in the cloud we obtain an estimate of the largest Lyapunov exponent. 
With this method the trajectories eventually diverge so much that they no longer 
sample the same neighborhood of the phase space. We could fix this by periodically 
rescaling the cloud to be near the reference trajectory, but we can even more directly 
study the behavior of trajectories in the neighborhood of a reference trajectory 
by integrating the variational equations along with the reference trajectory. In 
particular, let y’ = f(y) be an autonomous system of first-order ordinary differential 
equations and y(t) be the reference trajectory. We define a phase-space variational 
trajectory y + dy and note that dy satisfies a linear system of first-order ordinary 
differential equations with coefficients that depend on y(t), dy’ = J.déy where the 
elements of the Jacobian matrix are J;, = Of; /Oy;. 
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Fig. 6.2. The conventional representation of the Lyapunov exponent calculation, the log- 
arithm of -y versus the logarithm of time. Convergence to a positive exponent is indicated 
by a leveling off; for regular trajectories this plot approaches a line with slope minus one. 


6.3 Our numerical experiment 


For many years the longest direct integration of the outer planets was the 1-million- 
year integration of Cohen, Hubbard, and Oesterwinter [9]. Recently several longer 
integrations of the outer planets have been performed [5, 10, 11]. The longest 
was our set of 200-million-year integrations. Our new 845-million-year integration 
is significantly longer and more accurate than all previously reported long-term 
integrations. 


In our new integration of the motion of the outer planets the masses and initial 
conditions are the same as those used in our 200-million-year integrations of the 
outer planets. The reference frame is the invariable frame of Cohen, Hubbard, 
and Oesterwinter. The planet Pluto is taken to be a zero-mass test particle. We 
continue to neglect the effects of the inner four planets, the mass lost by the Sun as a 
result of electromagnetic radiation and solar wind, and general relativity. The most 
serious limitation of our integration is our ignorance of the true masses and initial 
conditions. Nevertheless, we believe that our model is sufficiently representative 
of the actual solar system that its study sheds light on the question of stability 
of the solar system. To draw more rigorous conclusions, we must determine the 
sensitivity of our conclusions to the uncertainties in masses and initial conditions, 
and to unmodeled effects. 
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Fig. 6.3. Common logarithm of the distance between several pairs of Plutos, in AU, versus 
the common logarithm of the time, in years. The initial segment of the graph closely fits 
a 3/2 power law (dashed line). The solid line is an exponential chosen to fit the long-time 
divergence of Plutos. The exponential growth takes over when its slope exceeds the slope 
of the power law. 


Our earlier integrations were limited to 100 million years forward and backward 
in time because of the accumulation of error, which was most seriously manifested 
in an accumulated longitude error of Jupiter of order 50°. In our new integrations 
we continue to use the 12th-order Stormer predictor [12], but a judicious choice 
of step size has reduced the numerical errors by several orders of magnitude. In 
all of our integrations the error in energy of the system varies nearly linearly with 
time. In the regime where neither roundoff nor truncation error is dominant the 
slope of energy as a function of time depends on step size in a complicated way. 
For some step sizes the energy level has a positive slope; for others the slope is 
negative. This suggests that there might be special step sizes for which there is no 
linear growth of energy error. By a series of numerical experiments we indeed found 
that there are values of the step size where the slope of the linear trend of energy 
vanishes. The special step sizes become better defined as the integration interval of 
the experiments is increased. 


We chose our step size on the basis of a dozen 3-million-year integrations, and 
numerous shorter integrations. For our new long integration we chose the step 
size to be 32.7 days. This seemingly innocuous change from a step size near 40 
days dramatically reduces the slope of the energy error, by roughly three orders 
of magnitude. If the numerical integration were truncation error-dominated, for 
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which the accumulated error is proportional to h”, where h is the step size and n 
is the order of the integrator, then this reduction of step size would improve the 
accumulated error by only about a factor of 10. 


In our new integration the relative energy error (energy minus initial energy 
divided by the magnitude of the initial energy) accumulated over 845 million years 
is -2.6 x 10—!°: the growth of the relative energy error is still very nearly linear with 
a slope of -3.0 x 107!9 year~!. By comparison the rate of growth of the relative 
energy error in our 200-million-year integrations was 1.8 x 10—!® year—!. The errors 
in other integrations of the outer solar system were comparable to the errors in our 
200-million-year integrations. The rate of growth of energy error in the 1-million- 
year integration of Cohen, Hubbard, and Oesterwinter was 2.4 x 107!§ year—!. For 
the 6-million-year integration of Kinoshita and Nakai [10] the relative energy error 
was approximately 5 x 107!® year-!. For the LONGSTOP integration the growth 
of relative energy (as defined in this article) was -2.5 x 10—!® year—!. Thus the 
rate of growth of energy error in the integration reported here is smaller than all 
previous long-term integrations of the outer planets by a factor of about 600. 


We verified that this improvement in energy conservation was reflected in a 
corresponding improvement in position and velocity errors by integrating the outer 
planets forward 3 million years and then backward to recover the initial conditions, 
over a range of step sizes. For the chosen step size of 32.7 days the error in recovering 
the initial positions of each of the planets is of order 10~° astronomical units (AU) 
or about 1500 km. Note that Jupiter has in this time traveled 2.5 x 10!° km. 


The error in the longtitude of Jupiter can be estimated if we assume that the en- 
ergy error is mainly in the orbit of Jupiter. The relative energy error is proportional 
to the relative error in orbital frequency so the error in longitude is proportional to 
the integral of the relative energy error: A\ ~ tnAE(t)/E, where n is the mean 
motion of Jupiter and ¢ is the time of integration. Because the energy error grows 
linearly with time the position error grows with the square of the time. The accu- 
mulated error in the longitude of Jupiter after 100 million years is only about 4 arc 
minutes. This is to be compared with the 50° accumulated error estimated for our 
200-million-year integrations. The error in the longitude of Jupiter after the full 
845 million years is about 5°. 


We have directly measured the integration error in the determination of the 
position of Pluto by integrating forward and backward over intervals as long as 3 
million years to determine how well we can reproduce the initial conditions. Over 
such short intervals the round-trip error in the position of Pluto grows as a power 
of the time with an exponent near 2. The error in position is approximately 1.3 
x 107!9¢? AU (where ¢ isin years). This growth of error is almost entirely in 
the integration of Pluto’s orbit; the round-trip error is roughly the same when we 
integrate the whole system and when we integrate Pluto in the field of the Sun 
only. It is interesting to note that in the integrations with the 32.7-day step size the 
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Fig. 6.4. The orbital element h = e sin w for Pluto over 845 million years. On this 
scale the dominant period (the 3.7-million-year circulation of the longitude of perihelion) 
is barely resolved. The most obvious component has a period of 137 million years. The 
sampling interval was increased in the second half of our integration. 


position errors in all the planets are comparable. Extrapolation of the round-trip 
error for Pluto over the full 845-million-year integration gives an error in longitude 
of less than 10 arc minutes. 


6.4 Lyapunov exponent of Pluto 


We estimated the largest Lyapunov exponent of Pluto by both the variational and 
the phase-space distance methods during the second half of our 845-million-year 
run. Fig. 6.1 shows the logarithm of the divergence of the phase-space distance in 
a representative two-particle experiment and the growth of the logarithm of the 
variational phase-space distance. We measured the phase-space distance by the 
ordinary Euclidean norm in the six-dimensional space with position and velocity 
coordinates. We measured position in AU and velocity in AU/day. Because the 
magnitude of the velocity in these units is small compared to the magnitude of the 
position, the phase-space distance is effectively equivalent to the positional distance, 
and we refer to phase-space distances in terms of AU. For both traces in this plot the 
average growth is linear, indicating exponential divergence of nearby trajectories 
with an e-folding time of approximately 20 million years. The shapes of these 
graphs are remarkably similar until the two-particle divergence grows to about 1 
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Fig. 6.5. The inclination i of Pluto over 845 million years. Besides the 34-million-year 
component and the 150-million-year component there appears to be a component with a 
period near 600 million years. 


AU, verifying that the motion in the neighborhood of Pluto is properly represented. 
A more conservative representation of this data is to plot the logarithm of versus 
the logarithm of time (Fig.6.2). The leveling off of this graph indicates a positive 
Lyapunov exponent. 


To study the details of the divergence of nearby trajectories we expand the early 
portion of the two-particle divergence graph (Fig.6.3). The separation between 
particles starts out as a power law with an exponent near 3/2. The square law 
we described earlier estimates the actual total error, including systematic errors in 
the integration process. The 3/2 power law describes the divergence of trajectories 
subject to the same systematic errors. Only after some time does the exponential 
take off. The power law is dominated by the exponential only after the rate of 
growth of the exponential exceeds the rate of growth of the power law. This suggests 
that the portion of the divergence of nearby trajectories that results only from the 
numerical error fits a 3/2 power law and that this error “seeds” the exponential 
divergence that is the hallmark of chaos. We tested this hypothesis by integrating 
a Cloud of test particles with the orbital elements of Pluto in the field of the Sun 
alone. The divergence of these Kepler “Plutos” grows as 3.16 x 107!7¢3/2 AU. 
This is identical to the initial divergence of the Plutos in the complete dynamical 
system, showing that two-body numerical error completely accounts for the initial 
divergence. 
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Only the second half of the integration was used in the computation of the 
Lyapunov exponents, because the measurement in the first half of our integration 
was contaminated by over-vigorous application of the rescaling method, and gavea 
Lyapunov exponent about a factor of 4 too large. The rescaling interval was only 
275,000 years, which was far too small. The rescaling interval must be long enough 
that the divergence of neighboring trajectories is dominated by the exponential 
divergence associated with chaotic behavior rather than the power law divergence 
caused by the accumulation of numerical errors. In our experiment the rescaling 
interval should have been greater than 30 million years. 


It is important to emphasize that the variational method of measuring the Lya- 
punov exponent has none of these problems. 


6.5 Features of the orbital elements of Pluto 


The largest component in the variation of h (Fig. 6.4) reflects the 3.7-million-year 
regression of the longitude of perihelion. The 27-million-year component we previ- 
ously reported is clearly visible, as is the 137-million-year component. The change 
in density of points reflects a change in the sampling interval. For the first 450 
million years of our integration we recorded the state of the system every 499,983 
days (about 1,369 years) of simulated time. For the second 400 million years we 
sampled 16 times less frequently. 


Besides the major 3.8-million-year component in the variation of the inclina- 
tion of Pluto (Fig.6.5) we can clearly discern the 34-million-year component we 
previously reported. Although there is no continuing secular decline in the inclina- 
tion, there is a component with a period near 150 million years and evidence for a 
component with a period of approximately 600 million years. 


The existence of significant orbital variations with such long periods would be 
quite surprising if the motion were quasiperiodic. For quasiperiodic trajectories we 
expect to find frequencies that are low order combinations of a few fundamental 
frequencies (one per degree of freedom). The natural time scale for the long-term 
evolution of a quasiperiodic planetary system is set by the periods of the circulation 
of the nodes and perihelia, which in this case are a few million years. Periods in the 
motion of Pluto comparable to the length of the integration have been found in all 
long-term integrations. This is consistent with the chaotic character of the motion 
of Pluto, as indicated by our measurement of a positive Lyapunov exponent. 


Usually the measurement of a positive Lyapunov exponent provides a confirma- 
tion of what is already visible to the eye; that is, chaotic trajectories look irregular. 
In this case, except for the very long periods, the plots of Pluto’s orbital elements do 
not look particularly irregular. However, the irregularity of the motion does mani- 
fest itself in the power spectra. For a quasiperiodic trajectory the power spectrum 
of any orbital element is composed of integral linear combinations of fundamental 
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Fig. 6.6. A portion of the power spectrum of Pluto’s h. In this graph 4A is the relative am- 
plitude. There appears to be a broad-band component to the spectrum. This is consistent 
with the chaotic character of the motion of Pluto as indicated by the positive Lyapunov 
exponent. 


frequencies, where the number of fundamental frequencies is equal to the number 
of degrees of freedom. The power spectrum of a chaotic trajectory usually appears 
to have some broad-band component. 


A portion of the power spectrum of Pluto’s h isshown in Fig. 6.6. For comparison 
the same portion of the power spectrum of Neptune’s h is shown in Fig. 6.7. This 
portion of the spectrum was chosen to avoid confusion introduced by nearby major 
lines. Hanning windows have been used to reduce spectral leakage; only the densely 
sampled part of the run was used in the computation of the Fourier transforms. 
The spectrum of Neptune is quite complicated but there is no evidence that it is 
not aline spectrum. On the other hand the spectrum of Pluto does appear to have 
a broad-band component. Note that both of these spectra are computed from the 
same integration run, by means of the same numerical methods. They are subject 
to the same error processes, so the differences we see are dynamical in origin. The 
amplitudes in both graphs are normalized in the same way, so we can see that the 
broad-band components in Pluto’s spectrum are mostly larger than the discrete 
components in Neptune’s spectrum. 


The lack of obvious irregularity in the orbital elements of Pluto indicates that 
the portion of the chaotic zone in which Pluto is currently moving is rather small. 
Since the global structure of the chaotic zone is not known it is not possible for us to 
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Fig. 6.7. A portion of the power spectrum of Neptune’s h. In this graph A is the relative 
amplitude. The spectrum is apparently a quite complicated line spectrum. That we do 
not observe a broad-band component is consistent with the motion being quasiperiodic. 


predict whether more irregular motions are likely. If the small chaotic zone in which 
Pluto is found connects to a larger chaotic region, relatively sudden transitions can 


be made to more irregular motion. This actually occurs for the motion of asteroids 
near the 3:1 Kirkwood gap [13]. 


On the other hand, the fact that the time scale for divergence is only an order of 
magnitude larger than the fundamental time scales of the system indicates that the 
chaotic behavior is robust. It is not a narrow chaotic zone associated with a high- 
order resonance. Even though we do not know the sensitivity of the observed chaotic 
behavior to the uncertainties in parameters and initial conditions, and unmodeled 
effects, the large Lyapunov exponent suggests that the chaotic behavior of Pluto is 
characteristic of a range of solar systems including the actual solar system. 


6.6 Conclusions and implications of Pluto’s chaotic motion 


Our numerical model indicates that the motion of Pluto is chaotic. The largest 
Lyapunov exponent is about 10-7* year—!. Thus the e-folding time for the diver- 
gence of trajectories is about 20 million years. It would not have been surprising to 
discover an instability with characteristic time of the order of the age of the solar 
system because such an instability would not yet have had enough time to produce 
apparent damage. Thus, considering the age of the solar system, 20 million years 
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is a remarkably short time scale for exponential divergence. 


The discovery of the chaotic nature of Pluto’s motion makes it more difficult 
to draw firm conclusions about the origin of Pluto. However, the orbit of Pluto is 
reminiscent of the orbits of asteroids on resonant chaotic trajectories, which typically 
evolve to high eccentricity and inclination [6]. This suggests that Pluto might have 
been formed with much lower eccentricity and inclination, as is typical of the other 
planets, and that it acquired its current peculiar orbit purely through deterministic 
chaotic dynamical processes. Of course, it is also possible that Pluto simply formed 
in an orbit near its current orbit. 


In our experiment Pluto is a zero-mass test particle. The real Pluto has a small 
mass. We expect that the inclusion of the actual mass of Pluto will not change 
the chaotic character of the motion. If so, Pluto’s irregular motion will chaotically 
pump the motion of the other members of the solar system and the chaotic behavior 
of Pluto would imply chaotic behavior of the rest of the solar system. 
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THERE’S PLENTY OF ROOM AT THE 
BOTTOM 


Richard Feynman 


7.1 An Invitation to Enter a New Field of Physics 


I imagine experimental physicists must often look with envy at men like Kamerlingh 
Onnes, who discovered a field like low temperature, which seems to be bottomless 
and in which one can go down and down. Such a man is then a leader and has some 
temporary monopoly in a scientific adventure. Percy Bridgman, in designing a way 
to obtain higher pressures, opened up another new field and was able to move into it 
and to lead us all along. The development of ever higher vacuum was a continuing 
development of the same kind. 


I would like to describe a field, in which little has been done, but in which an 
enormous amount can be done in principle. This field is not quite the same as the 
others in that it will not tell us much of fundamental physics (in the sense of, “What 
are the strange particles?” ) but it is more like solid-state physics in the sense that 
it might tell us much of great interest about the strange phenomena that occur in 
complex situations. Furthermore, a point that is most important is that it would 
have an enormous number of technical applications. 


What I want to talk about is the problem of manipulating and controlling things 
on a small scale. 


As soon as I mention this, people tell me about miniaturization, and how far it 
has progressed today. They tell me about electric motors that are the size of the 
nail on your small finger. And there is a device on the market, they tell me, by 
which you can write the Lord’s Prayer on the head of a pin. But that’s nothing; 
that’s the most primitive, halting step in the direction I intend to discuss. It is a 
staggeringly small world that is below. In the year 2000, when they look back at 
this age, they will wonder why it was not until the year 1960 that anybody began 
seriously to move in this direction. 


Why cannot we write the entire 24 volumes of the Encyclopaedia Brittanica on 
the head of a pin? 


Let’s see what would be involved. The head of a pin is a sixteenth of an inch 
across. If you magnify it by 25,000 diameters, the area of the head of the pin is 
then equal to the area of all the pages of the Encyclopaedia Brittanica. Therefore, 
all it is necessary to do is to reduce in size all the writing in the Encyclopaedia by 
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25,000 times. Is that possible? The resolving power of the eye is about 1/120 of 
an inch — that is roughly the diameter of one of the little dots on the fine half- 
tone reproductions in the Encyclopaedia. This, when you demagnify it by 25,000 
times, is still 80 angstroms in diameter — 32 atoms across, in an ordinary metal. 
In other words, one of those dots still would contain in its area 1,000 atoms. So, 
each dot can easily be adjusted in size as required by the photoengraving, and there 
is no question that there is enough room on the head of a pin to put all of the 
Encyclopaedia Brittanica. 


Furthermore, it can be read if it is so written. Let’s imagine that it is written 
in raised letters of metal; that is, where the black is in the Encyclopaedia, we have 
raised letters of metal that are actually 1/25,000 of their ordinary size. How would 
we read it? 


If we had something written in such a way, we could read it using techniques in 
common use today. (They will undoubtedly find a better way when we do actually 
have it written, but to make my point conservatively I shall just take techniques we 
know today.) We would press the metal into a plastic material and make a mold of 
it, then peel the plastic off very carefully, evaporate silica into the plastic to get a 
very thin film, then shadow it by evaporating gold at an angle against the silica so 
that all the little letters will appear clearly, dissolve the plastic away from the silica 
film, and then look through it with an electron microscope! 


There is no question that if the thing were reduced by 25,000 times in the form 
of raised letters on the pin, it would be easy for us to read it today. Furthermore; 
there is no question that we would find it easy to make copies of the master; we 
would just need to press the same metal plate again into plastic and we would have 
another copy. 


7.2 How do we write small? 


The next question is: How do we write it? We have no standard technique to do 
this now. But let me argue that it is not as difficult as it first appears to be. We 
can reverse the lenses of the electron microscope in order to demagnify as well as 
magnify. A source of ions, sent through the microscope Jenses in reverse, could be 
focused to a very small spot. We could write with that spot like we write in a TV 
cathode ray oscilloscope, by going across in lines, and having an adjustment which 
determines the amount of material which is going to be deposited as we scan in 
lines. 


This method might be very slow because of space charge limitations. There will 
be more rapid methods. We could first make, perhaps by some photo process, a 
screen which has holes in it in the form of the letters. Then we would strike an 
arc behind the holes and draw metallic ions through the holes; then we could again 
use our system of lenses and make a small image in the form of ions, which would 
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deposit the metal on the pin. 


A simpler way might be this (though I am not sure it would work): We take 
light and, through an optical microscope running backwards, we focus it onto a very 
small photoelectric screen. Then electrons come away from the screen where the 
light is shining. These electrons are focused down in size by the electron microscope 
lenses to impinge directly upon the surface of the metal. Will such a beam etch 
away the metal if it is run long enough? I don’t know. If it doesn’t work for a metal 
surface, it must be possible to find some surface with which to coat the original pin 
so that, where the electrons bombard, a change is made which we could recognize 
later. 


There is no intensity problem in these devices — not what you are used to in 
magnification, where you have to take a few electrons and spread them over a bigger 
and bigger screen; it is just the opposite. The light which we get from a page is 
concentrated onto a very small area so it is very intense. The few electrons which 
come from the photoelectric screen are demagnified down to a very tiny area so 
that, again, they are very intense. I don’t know why this hasn’t been done yet! 


That’s the Encyclopaedia Brittanica on the head of a pin, but let’s consider 
all the books in the world. The Library of Congress has approximately 9 million 
volumes; the British Museum Library has 5 million volumes; there are also 5 million 
volumes in the National Library in France. Undoubtedly there are duplications, so 
let us say that there are some 24 million volumes of interest in the world. 


What would happen ifI print all this down at the scale we have been discussing? 
How much space would it take? It would take, of course, the area of about a million 
pinheads because, instead of there being just the 24 volumes of the Encyclopaedia, 
there are 24 million volumes. The million pinheads can be put in a square of a 
thousand pins on a side, or an area of about 3 square yards. That is to say, the 
silica replica with the paper-thin backing of plastic, with which we have made the 
copies, with all this information, is on an area of approximately the size of 35 
pages of the Encyclopaedia. That is about half as many pages as there are in this 
magazine. All of the information which all of mankind has every recorded in books 
can be carried around in a pamphlet in your hand — and not written in code, but 
a simple reproduction of the original pictures, engravings, and everything else on a 
small scale without loss of resolution. 


What would our librarian at Caltech say, as she runs all over from one building 
to another, if I tell her that, ten years from now, all of the information that she is 
struggling to keep track of — 120,000 volumes, stacked from the floor to the ceiling, 
drawers full of cards, storage rooms full of the older books — can be kept on just 
one library card! When the University of Brazil, for example, finds that their library 
is burned, we can send them a copy of every book in our library by striking off a 
copy from the master plate in a few hours and mailing it in an envelope no bigger 
or heavier than any other ordinary air mail letter. 
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Now, the name of this talk is “There is Plenty of Room at the Bottom” — not 
just “There is Room at the Bottom.” What I have demonstrated is that there is 
room — that you can decrease the size of things in a practical way. I now want to 
show that there is plenty of room. I will not now discuss how we are going to do it, 
but only what is possible in principle — in other words, what is possible according 
to the laws of physics. 1 am not inventing anti-gravity, which is possible someday 
only if the laws are not what we think. I am telling you what could be done if the 
laws are what we think; we are not doing it simply because we haven’t yet gotten 
around to it. 


7.3 Information on a small scale 


Suppose that, instead of trying to reproduce the pictures and all the information 
directly in its present form, we write only the information content in a code of dots 
and dashes, or something like that, to represent the various letters. Each letter 
represents six or seven “bits” of information; that is, you need only about six or 
seven dots or dashes for each letter. Now, instead of writing everything, as I did 
before, on the surface of the head of a pin, I am going to use the interior of the 
material as well. 


Let us represent a dot by a small spot of one metal, the next dash, by an 
adjacent spot of another metal, and so on. Suppose, to be conservative, that a bit 
of information is going to require a little cube of atoms 5 times 5 times 5 — that 
is 125 atoms. Perhaps we need a hundred and some odd atoms to make sure that 
the information is not lost through diffusion, or through some other process. 


I have estimated how many letters there are in the Encyclopaedia, and I have 
assumed that each of my 24 million books is as big as an Encyclopaedia volume, 
and have calculated, then, how many bits of information there are (10!°). For each 
bit I allow 100 atoms. And it turns out that all of the information that man has 
carefully accumulated in all the books in the world can be written in this form in 
a cube of material one two-hundredth of an inch wide — which is the barest piece 
of dust that can be made out by the human eye. So there is plenty of room at the 
bottom! Don’t tell me about microfilm! 


This fact — that enormous amounts of information can be carried in an exceed- 
ingly small space — is, of course, well known to the biologists, and resolves the 
mystery which existed before we understood all this clearly, of how it could be that, 
in the tiniest cell, all of the information for the organization of a complex creature 
such as ourselves can be stored. All this information — whether we have brown 
eyes, or whether we think at all, or that in the embryo the jawbone should first 
develop with a little hole in the side so that later a nerve can grow through it — 
all this information is contained in a very tiny fraction of the cell in the form of 
long-chain DNA molecules in which approximately 50 atoms are used for one bit of 
information about the cell. 
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7.4 Better electron microscopes 


If I have written in a code, with 5 times 5 times 5 atoms to a bit, the question 
is: How could I read it today? The electron microscope is not quite good enough, 
with the greatest care and effort, it can only resolve about 10 angstroms. I would 
like to try and impress upon you while I am talking about all of these things on 
a small scale, the importance of improving the electron microscope by a hundred 
times. It is not impossible; it is not against the laws of diffraction of the electron. 
The wave length of the electron in such a microscope is only 1/20 of an angstrom. 
So it should be possible to see the individual atoms. What good would it be to see 
individual atoms distinctly? 


We have friends in other fields — in biology, for instance. We physicists of- 
ten look at them and say, “You know the reason you fellows are making so little 
progress?” (Actually I don’t know any field where they are making more rapid 
progress than they are in biology today.) “You should use more mathematics, like 
we do.” They could answer us — but they’re polite, so I'll answer for them: “What 
you should do in order for us to make more rapid progress is to make the electron 
microscope 100 times better.” 


What are the most central and fundamental problems of biology today? They 
are questions like: What is the sequence of bases in the DNA? What happens when 
you have a mutation? How is the base order in the DNA connected to the order 
of amino acids in the protein? What is the structure of the RNA; is it single-chain 
or double-chain, and how is it related in its order of bases to the DNA? What is 
the organization of the microsomes? How are proteins synthesized? Where does 
the RNA go? How does it sit? Where do the proteins sit? Where do the amino 
acids go in? In photosynthesis, where is the chlorophyll; how is it arranged; where 
are the carotenoids involved in this thing? What is the system of the conversion of 
light into chemical energy? 


It is very easy to answer many of these fundamental biological questions; you 
just look at the thing! You will see the order of bases in the chain; you will see the 
structure of the microsome. Unfortunately, the present microscope sees at a scale 
which is just a bit too crude. Make the microscope one hundred times more powerful, 
and many problems of biology would be made very much easier. I exaggerate, of 
course, but the biologists would surely be very thankful to you — and they would 
prefer that to the criticism that they should use more mathematics. 


The theory of chemical processes today is based on theoretical physics. In this 
sense, physics supplies the foundation of chemistry. But chemistry also has analysis. 
If you have a strange substance and you want to know what it is, you go through a 
long and complicated process of chemical analysis. You can analyze almost anything 
today, so I am a little late with my idea. But if the physicists wanted to, they could 
also dig under the chemists in the problem of chemical analysis. It would be very 
easy to make an analysis of any complicated chemical substance; all one would have 
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to do would be to look at it and see where the atoms are. The only trouble is 
that the electron microscope is one hundred times too poor. (Later, I would like 
to ask the question: Can the physicists do something about the third problem of 
chemistry — namely, synthesis? Is there a physical way to synthesize any chemical 
substance?) 


The reason the electron microscope is so poor is that the f-value of the lenses is 
only 1 part to 1,000; you don’t have a big enough numerical aperture. And I know 
that there are theorems which prove that it is impossible, with axially symmetrical 
stationary field lenses, to produce an f-value any bigger than so and so; and therefore 
the resolving power at the present time is at its theoretical maximum. But in every 
theorem there are assumptions. Why must the field be symmetrical? I put this out 
as a Challenge: Is there no way to make the electron microscope more powerful? 


7.5 The marvelous biological system 


The biological example of writing information on a small scale has inspired me to 
think of something that should be possible. Biology is not simply writing informa- 
tion; it is doing something about it. A biological system can be exceedingly small. 
Many of the cells are very tiny, but they are very active; they manufacture various 
substances; they walk around; they wiggle; and they do all kinds of marvelous things 
— all on a very small scale. Also, they store information. Consider the possibility 
that we too can make a thing very small which does what we want — that we can 
manufacture an object that maneuvers at that level! 


There may even be an economic point to this business of making things very 
small. Let me remind you of some of the problems of computing machines. In 
computers we have to store an enormous amount of information. The kind of writing 
that I was mentioning before, in which I had everything down as a distribution of 
metal, is permanent. Much more interesting to a computer is a way of writing, 
erasing, and writing something else. (This is usually because we don’t want to 
waste the material on which we have just written. Yet if we could write it in a very 
small space, it wouldn’t make any difference; it could just be thrown away after it 
was read. It doesn’t cost very much for the material). 


7.6 Miniaturizing the computer 


I don’t know how to do this on a small scale in a practical way, but I do know that 
computing machines are very large; they fill rooms. Why can’t we make them very 
small, make them of little wires, little elements — and by little, I mean little. For 
instance, the wires should be 10 or 100 atoms in diameter, and the circuits should be 
a few thousand angstroms across. Everybody who has analyzed the logical theory 
of computers has come to the conclusion that the possibilities of computers are 
very interesting — if they could be made to be more complicated by several orders 
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of magnitude. If they had millions of times as many elements, they could make 
judgements. They would have time to calculate what is the best way to make the 
calculation that they are about to make. They could select the method of analysis 
which, from their experience, is better than the one that we would give to them. 
And in many other ways, they would have new qualitative features. 


If I look at your face I immediately recognize that I have seen it before. (Actually, 
my friends will say I have chosen an unfortunate example here for the subject of 
this illustration. At least I recognize that it is a man and not an apple.) Yet there is 
no machine which, with that speed, can take a picture of a face and say even that it 
is a man; and much less that it is the same man that you showed it before — unless 
it is exactly the same picture. If the face is changed; if I am closer to the face; if 1 
am further from the face; if the light changes — I recognize it anyway. Now, this 
little computer I carry in my head is easily able to do that. The computers that 
we build are not able to do that. The number of elements in this bone box of mine 
are enormously greater than the number of elements in our “wonderful” computers. 
But our mechanical computers are too big; the elements in this box are microscopic. 
I want to make some that are sub microscopic. 


If we wanted to make a computer that had all these marvelous extra qualita- 
tive abilities, we would have to make it, perhaps, the size of the Pentagon. This 
has several disadvantages. First, it requires too much material; there may not be 
enough germanium in the world for all the transistors which would have to be put 
into this enormous thing. There is also the problem of heat generation and power 
consumption; TVA would be needed to run the computer. But an even more prac- 
tical difficulty is that the computer would be limited to a certain speed. Because 
of its large size, there is finite time required to get the information from one place 
to another. The information cannot go any faster than the speed of light — so, 
ultimately, when our computers get faster and faster and more and more elaborate, 
we will have to make them smaller and smaller. 


But there is plenty of room to make them smaller. There is nothing that I can see 
in the physical laws that says the computer elements cannot be made enormously 
smaller than they are now. In fact, there may be certain advantages. 


7.7 Miniaturization by evaporation 


How can we make such a device? What kind of manufacturing processes would 
we use? One possibility we might consider, since we have talked about writing by 
putting atoms down in a certain arrangement, would be to evaporate the material, 
then evaporate the insulator next to it. Then, for the next layer, evaporate another 
position of a wire, another insulator, and so on. So, you simply evaporate until you 
have a block of stuff which has the elements — coils and condensers, transistors 
and so on — of exceedingly fine dimensions. 
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But I would like to discuss, just for amusement, that there are other possibilities. 
Why can’t we manufacture these small computers somewhat like we manufacture 
the big ones? Why can’t we drill holes, cut things, solder things, stamp things 
out, mold different shapes all at an infinitesimal level? What are the limitations 
as to how small a thing has to be before you can no longer mold it? How many 
times when you are working on something frustratingly tiny like your wife’s wrist 
watch, have you said to yourself, “If I could only train an ant to do this!” What I 
would like to suggest is the possibility of training an ant to train a mite to do this. 
What are the possibilities of small but movable machines? They may or may not 
be useful, but they surely would be fun to make. 


Consider any machine — for example, an automobile — and ask about the 
problems of making an infinitesimal machine like it. Suppose, in the particular 
design of the automobile, we need a certain precision of the parts; we need an 
accuracy, let’s suppose, of 4/10,000 of an inch. If things are more inaccurate than 
that in the shape of the cylinder and so on, it isn’t going to work very well. If I 
make the thing too small, I have to worry about the size of the atoms; I can’t make 
a circle of “balls” so to speak, if the circle is too small. So, if I make the error, 
corresponding to 4/10,000 of an inch, correspond to an error of 10 atoms, it turns 
out that I can reduce the dimensions of an automobile 4,000 times, approximately 
— so that it is 1 mm. across. Obviously, if you redesign the car so that it would 
work with a much larger tolerance, which is not at all impossible, then you could 
make a much smaller device. 


It is interesting to consider what the problems are in such small machines. 
Firstly, with parts stressed to the same degree, the forces go as the area you are 
reducing, so that things like weight and inertia are of relatively no importance. 
The strength of material, in other words, is very much greater in proportion. The 
stresses and expansion of the flywheel from centrifugal force, for example, would be 
the same proportion only if the rotational speed is increased in the same proportion 
as we decrease the size. On the other hand, the metals that we use have a grain 
structure, and this would be very annoying at small scale because the material is 
not homogeneous. Plastics and glass and things of this amorphous nature are very 
much more homogeneous, and so we would have to make our machines out of such 
materials. 


There are problems associated with the electrical part of the system — with 
the copper wires and the magnetic parts. The magnetic properties on a very small 
scale are not the same as on a large scale; there is the “domain” problem involved. 
A big magnet made of millions of domains can only be made on a small scale with 
one domain. The electrical equipment won’t simply be scaled down; it has to be 
redesigned. But I can see no reason why it can’t be redesigned to work again. 
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7.8 Problems of lubrication 


Lubrication involves some interesting points. The effective viscosity of oil would be 
higher and higher in proportion as we went down (and if we increase the speed as 
much as we can). If we don’t increase the speed so much, and change from oil to 
kerosene or some other fluid, the problem is not so bad. But actually we may not 
have to lubricate at all! We have a lot of extra force. Let the bearings run dry; 
they won’t run hot because the heat escapes away from such a small device very, 
very rapidly. This rapid heat loss would prevent the gasoline from exploding, so 
an internal combustion engine is impossible. Other chemical reactions, liberating 
energy when cold, can be used. Probably an external supply of electrical power 
would be most convenient for such small machines. 


What would be the utility of such machines? Who knows? Of course, a small 
automobile would only be useful for the mites to drive around in, and I suppose 
our Christian interests don’t go that far. However, we did note the possibility of 
the manufacture of small elements for computers in completely automatic factories, 
containing lathes and other machine tools at the very small level. The small lathe 
would not have to be exactly like our big lathe. I leave to your imagination the 
improvement of the design to take full advantage of the properties of things on a 
small scale, and in such a way that the fully automatic aspect would be easiest to 
manage. 


A friend of mine (Albert R. Hibbs) suggests a very interesting possibility for 
relatively small machines. He says that, although it is a very wild idea, it would 
be interesting in surgery if you could swallow the surgeon. You put the mechanical 
surgeon inside the blood vessel and it goes into the heart and “looks” around. (Of 
course the information has to be fed out.) It finds out which valve is the faulty one 
and takes a little knife and slices it out. Other small machines might be permanently 
incorporated in the body to assist some inadequately-functioning organ. 


Now comes the interesting question: How do we make such a tiny mechanism? 
I leave that to you. However, let me suggest one weird possibility. You know, in 
the atomic energy plants they have materials and machines that they can’t handle 
directly because they have become radioactive. To unscrew nuts and put on bolts 
and so on, they have a set of master and slave hands, so that by operating a set of 
levers here, you control the “hands” there, and can turn them this way and that so 
you can handle things quite nicely. 


Most of these devices are actually made rather simply, in that there is a par- 
ticular cable, like a marionette string, that goes directly from the controls to the 
“hands.” But, of course, things also have been made using servo motors, so that the 
connection between the one thing and the other is electrical rather than mechanical. 
When you turn the levers, they turn a servo motor, and it changes the electrical 
currents in the wires, which repositions a motor at the other end. 
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Now, I want to build much the same device — a master-slave system which 
operates electrically. But I want the slaves to be made especially carefully by 
modern large-scale machinists so that they are one-fourth the scale of the “hands” 
that you ordinarily maneuver. So you have a scheme by which you can do things 
at one-quarter scale anyway — the little servo motors with little hands play with 
little nuts and bolts; they drill little holes; they are four times smaller. Aha! So 
I manufacture a quarter-size lathe; I manufacture quarter-size tools; and I make, 
at the one-quarter scale, still another set of hands again relatively one-quarter size! 
This is one-sixteenth size, from my point of view. And after I finish doing this 
I wire directly from my large-scale system, through transformers perhaps, to the 
one-sixteenth-size servo motors. Thus I can now manipulate the one-sixteenth size 
hands. 


Well, you get the principle from there on. It is rather a difficult program, but 
it is a possibility. You might say that one can go much farther in one step than 
from one to four. Of course, this has all to be designed very carefully and it is not 
necessary simply to make it like hands. If you thought of it very carefully, you could 
probably arrive at a much better system for doing such things. 


If you work through a pantograph, even today, you can get much more than a 
factor of four in even one step. But you can’t work directly through a pantograph 
which makes a smaller pantograph which then makes a smaller pantograph — be- 
cause of the looseness of the holes and the irregularities of construction. The end 
of the pantograph wiggles with a relatively greater irregularity than the irregularity 
with which you move your hands. In going down this scale, I would find the end of 
the pantograph on the end of the pantograph on the end of the pantograph shaking 
so badly that it wasn’t doing anything sensible at all. 


At each stage, it is necessary to improve the precision of the apparatus. If, 
for instance, having made a small lathe with a pantograph, we find its lead screw 
irregular — more irregular than the large-scale one — we could lap the lead screw 
against breakable nuts that you can reverse in the usual way back and forth until 
this lead screw is, at its scale, as accurate as our original lead screws, at our scale. 


We can make flats by rubbing unflat surfaces in triplicates together — in three 
pairs — and the flats then become flatter than the thing you started with. Thus, 
it is not impossible to improve precision on a small scale by the correct operations. 
So, when we build this stuff, it is necessary at each step to improve the accuracy 
of the equipment by working for awhile down there, making accurate lead screws, 
Johansen blocks, and all the other materials which we use in accurate machine work 
at the higher level. We have to stop at each level and manufacture all the stuff to 
go to the next level — a very long and very difficult program. Perhaps you can 
figure a better way than that to get down to small scale more rapidly. 


Yet, after all this, you have just got one little baby lathe four thousand times 
smaller than usual. But we were thinking of making an enormous computer, which 
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we were going to build by drilling holes on this lathe to make little washers for the 
computer. How many washers can you manufacture on this one lathe? 


7.9 A hundred tiny hands 


When I make my first set of slave “hands” at one-fourth scale, I am going to make 
ten sets. I make ten sets of “hands,” and I wire them to my original levers so 
they each do exactly the same thing at the same time in parallel. Now, when I am 
making my new devices one-quarter again as small, I let each one manufacture ten 
copies, so that I would have a hundred “hands” at the 1/16th size. 


Where am I going to put the million lathes that I am going to have? Why, 
there is nothing to it; the volume is much less than that of even one full-scale lathe. 
For instance, if I made a billion little lathes, each 1/4000 of the scale of a regular 
lathe, there are plenty of materials and space available because in the billion little 
ones there is less than 2 percent of the materials in one big lathe. It doesn’t cost 
anything for materials, you see. So! want to build a billion tiny factories, models of 
each other, which are manufacturing simultaneously, drilling holes, stamping parts, 
and so on. 


As we go down in size, there are a number of interesting problems that arise. All 
things do not simply scale down in proportion. There is the problem that materials 
stick together by the molecular (Van der Waals) attractions. It would be like this: 
After you have made a part and you unscrew the nut from a bolt, it isn’t going to 
fall down because the gravity isn’t appreciable; it would even be hard to get it off 
the bolt. It would be like those old movies of a man with his hands full of molasses, 
trying to get rid of a glass of water. There will be several problems of this nature 
that we will have to be ready to design for. 


7.10 Rearranging the atoms 


But I am not afraid to consider the final question as to whether, ultimately — in 
the great future — we can arrange the atoms the way we want; the very atoms, all 
the way down! What would happen if we could arrange the atoms one by one the 
way we want them (within reason, of course; you can’t put them so that they are 
chemically unstable, for example). 


Up to now, we have been content to dig in the ground to find minerals. We 
heat them and we do things on a large scale with them, and we hope to get a 
pure substance with just so much impurity, and so on. But we must always accept 
some atomic arrangement that nature gives us. We haven’t got anything, say, with 
a “checkerboard” arrangement, with the impurity atoms exactly arranged 1,000 
angstroms apart, or in some other particular pattern. 


What could we do with layered structures with just the right layers? What 
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would the properties of materials be if we could really arrange the atoms the way 
we want them? They would be very interesting to investigate theoretically. I can’t 
see exactly what would happen, but I can hardly doubt that when we have some 
control of the arrangement of things on a small scale we will get an enormously 
greater range of possible properties that substances can have, and of different things 
that we can do. 


Consider, for example, a piece of material in which we make little coils and 
condensers (or their solid state analogs) 1,000 or 10,000 angstroms in a circuit, one 
right next to the other, over a large area, with little antennas sticking out at the 
other end — a whole series of circuits. Is it possible, for example, to emit light from 
a whole set of antennas, like we emit radio waves from an organized set of antennas 
to beam the radio programs to Europe? The same thing would be to beam the light 
out in a definite direction with very high intensity. (Perhaps such a beam is not 
very useful technically or economically.) 


I have thought about some of the problems of building electric circuits on a small 
scale, and the problem of resistance is serious. If you build a corresponding circuit 
on a small scale, its natural frequency goes up, since the wave length goes down 
as the scale; but the skin depth only decreases with the square root of the scale 
ratio, and so resistive problems are of increasing difficulty. Possibly we can beat 
resistance through the use of superconductivity if the frequency is not too high, or 
by other tricks. 


7.11 Atoms in a small world 


When we get to the very, very small world — say circuits of seven atoms — we have 
a lot of new things that would happen that represent completely new opportunities 
for design. Atoms on a small scale behave like nothing on a large scale, for they 
satisfy the laws of quantum mechanics. So, as we go down and fiddle around with 
the atoms down there, we are working with different laws, and we can expect to 
do different things. We can manufacture in different ways. We can use, not just 
circuits, but some system involving the quantized energy levels, or the interactions 
of quantized spins, etc. 


Another thing we will notice is that, if we go down far enough, all of our devices 
can be mass produced so that they are absolutely perfect copies of one another. We 
cannot build two large machines so that the dimensions are exactly the same. But 
if your machine is only 100 atoms high, you only have to get it correct to one-half 
of one percent to make sure the other machine is exactly the same size — namely, 
100 atoms high! 


At the atomic level, we have new kinds of forces and new kinds of possibilities, 
new kinds of effects. The problems of manufacture and reproduction of materials 
will be quite different. I am, as | said, inspired by the biological phenomena in 
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which chemical forces are used in repetitious fashion to produce all kinds of weird 
effects (one of which is the author). The principles of physics, as far as I can see, 
do not speak against the possibility of maneuvering things atom by atom. It is not 
an attempt to violate any laws; it is something, in principle, that can be done; but 
in practice, it has not been done because we are too big. 


Ultimately, we can do chemical synthesis. A chemist comes to us and says, 
“Look, I want a molecule that has the atoms arranged thus and so; make me that 
molecule.” The chemist does a mysterious thing when he wants to make a molecule. 
He sees that it has got that ring, so he mixes this and that, and he shakes it, and 
he fiddles around. And, at the end of a difficult process, he usually does succeed in 
synthesizing what he wants. By the time I get my devices working, so that we can 
do it by physics, he will have figured out how to synthesize absolutely anything, so 
that this will really be useless. 


But it is interesting that it would be, in principle, possible (I think) for a physicist 
to synthesize any chemical substance that the chemist writes down. Give the orders 
and the physicist synthesizes it. How? Put the atoms down where the chemist 
says, and so you make the substance. The problems of chemistry and biology can 
be greatly helped if our ability to see what we are doing, and to do things on an 
atomic level, is ultimately developed — a development which I think cannot be 
avoided. Now, you might say, “Who should do this and why should they do it?” 
Well, I pointed out a few of the economic applications, but I know that the reason 
that you would do it might be just for fun. But have some fun! Let’s have a 
competition between laboratories. Let one laboratory make a tiny motor which it 
sends to another lab which sends it back with a thing that fits inside the shaft of 
the first motor. 


7.12 High school competition 


Just for the fun of it, and in order to get kids interested in this field, I would propose 
that someone who has some contact with the high schools think of making some 
kind of high school competition. After all, we haven’t even started in this field, and 
even the kids can write smaller than has ever been written before. They could have 
competition in high schools. The Los Angeles high school could send a pin to the 
Venice high school on which it says, “How’s this?” They get the pin back, and in 
the dot of the “i” it says, “Not so hot.” 


Perhaps this doesn’t excite you to do it, and only economics will do so. Then I 
want to do something; but I can’t do it at the present moment, because I haven’t 
prepared the ground. It is my intention to offer a prize of $1,000 to the first guy 
who can take the information on the page of a book and put it on an area 1/25,000 
smaller in linear scale in such manner that it can be read by an electron microscope. 


And I want to offer another prize — if I can figure out how to phrase it so that 
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I don’t get into a mess of arguments about definitions — of another $1,000 to the 
first guy who makes an operating electric motor — a rotating electric motor which 
can be controlled from the outside and, not counting the lead-in wires, is only 1/64 
inch cube. 


I do not expect that such prizes will have to wait very long for claimants. 


This transcript of the classic talk that Richard Feynman gave on December 29th 1959 
at the annual meeting of the American Physical Society at the California Institute of 
Technology (Caltech) was first published in the February 1960 issue of Caltech’s Engineer- 


ing and Science, which owns the copyright. It has been made available with their kind 
permission. 
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INFORMATION IS INEVITABLY PHYSICAL 


Rolf Landauer 


Abstract 


Information is inevitably tied to a physical representation, and therefore to all the 
possibilities and restrictions allowed by our real physical universe. The theory of 
computational limits is reviewed in a historical fashion. After some widespread ini- 
tial errors, it was eventually understood that statistical mechanics and elementary 
quantum mechanics do not provide any limits. The energy requirements of the com- 
munications channel are particularly emphasized; it is an area where lower bounds, 
accepted for decades, are circumventable. The utility of the time-modulated po- 
tential going from monostability to bistability and back, is emphasized. Despite 
its use by von Neumann, Feynman, and many others, it has not received broad 
attention, i.e. by those not actually invoking it for their own purposes. I revisit my 
long-standing contention that our real universe does not permit the unlimited chain 
of infallible operations, envisioned in continuum mathematics, and that this has an 
influence on the ultimate nature of physical law. Finally, in the spirit of a volume 
dedicated to Richard Feynman’s impact, I deplore the strong effects of fashions in 
science. 


8.1 Information is Physical 


Information is inevitably tied to a physical representation. It can be engraved on 
stone tablets, denoted by a spin up or down, a charge present or absent, a hole 
punched in a card, or many other alternative physical phenomena. It is not just 
an abstract entity; it does not exist except through a physical embodiment. It is, 
therefore, tied to the laws of physics and the parts available to us in our real physical 
universe. This is a viewpoint which was invoked by Szilard [1] in his analysis of 
Maxwell’s demon. Szilard’s analysis was not all that definitive as far as the demon 
was concerned, but his understanding of the physical nature of information was truly 
pioneering. Even in recent years this viewpoint is still not all that widely accepted. 
Penrose, [2] for example, tells us: “ . . devices can yield only approximations to a 
structure that has a deep and ‘computer-independent’ existence of its own.” 


When we learned to count on our sticky little classical fingers, we were misled. 
We thought that an integer had to have a particular and unique value. But in the 
real world, which is quantum mechanical, we can have a coherent superposition 
of a state with two photons and one with five. This is a degree of freedom, a 
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possibility, which has to be understood and its utility has to be assessed. The fact 
that its advantages are advertised too unhesitantly by the advocates of quantum 
parallelism (for a collection of papers on this subject, see Ref. [(3]) should not blind 
us to the need to examine all the possibilities, as well as the restrictions, which 
come with the physical nature of information. 


The physical nature of information leads us to an analysis of the limits imposed 
on information handling by the laws of physics and by the parts available in the 
universe, and also leads us to the attempt to exploit all the possibilities offered by 
physics. We will start with the discussion of limits in the next section and later, in 
Sec. 8.5, discuss the impact of that on the laws of physics. 


8.2 Limits 


The origin of the modern electronic computer can be found in diverse places; in the 
Jacquard loom, in Babbage’s inventions, or in Hollerith’s machinery for tabulating 
the 1890 U.S. census data. The real momentum, however, came around the end of 
World War II. 1996 saw the fiftieth anniversary of the ENIAC; 1995 saw the fiftieth 
anniversary of my own organization, IBM Research. A good many other related 
significant events stem from those years. 


A concern with the fundamental physical limits of the computer appeared soon 
after the arrival of the computer. Shannon’s information theory [4] had already 
taught us to think about the ultimate limits of information handling, and estab- 
lished a concern with the relation between information and entropy. Unfortunately, 
in its early stages, the more computer-oriented discussions were not particularly dis- 
ciplined. Scientists are proud of their ability to do back of the envelope calculations; 
to react simply and quickly to the essence of a problem. 


Unfortunately, as discussed in detail in Ref. [5], in this area most of the early 
attempts turned out to be wrong. The zig-zag pattern emphasized in the title of [5] 
continues to the present. For example, the possibility of computation by totally 
coherent quantum mechanical machinery [6] was appreciated at a late stage. It 
took several years after that for the invention of quantum parallelism [7]. The 
widespread understanding of the need to handle errors in such machinery has come 
only in the last few years. 


Brillouin’s book [8] typifies the early thinking. Without deprecating the many 
earlier accomplishments of this major scientist, the book left me with the reaction: 
“There must be a better way to think about that.” This reaction was shared by 
my local colleague and collaborator, John Swanson. One result of that concern 
was Ref. [9] pointing out that the computer operations which inevitably demand a 
minimal unavoidable energy dissipation are those that discard information. That 
history has been discussed in Refs. [5] and [10]. It may be worthwhile, however, to 
point out quite how much Ref. [9] defied the prevailing wisdom. It was “known” 
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in 1961 that it takes kTln2 to ship a bit, and in a computer we do lots of bit 
transmission, even when we do not discard information. (Example: A shift register 
where we simply move bits along.) It was also “known” that it takes energy to 
make a measurement, and a circuit can be presumed to measure its inputs. It took 
a long time, however, after Ref. [9] to rectify the prevalent, but incorrect, notions 
about measurement and communication. 


The fact that computing can be accomplished through a sequence of 1:1 map- 
pings, which do not discard information, was pointed out in Ref. [9]. The consistent 
and clear understanding and utilization of that for reversible computation had to 
wait for Bennett [11]. Ref. [11] showed that computation in classical systems, with 
noise, and with frictional forces proportional to velocity, can be done with arbitrar- 
ily little dissipation, per step. Bennett’s work, showing that computation can be 
done by a sequence of 1:1 mappings, in turn, allowed the way for the appreciation 
that a totally coherent quantum mechanical time evolution (which inevitably has to 
be 1:1) can cause interacting bits to change with time as needed in a computer [6]. 
Benioff’s insight, in Ref. [6], is one that, today, may also be hard to appreciate. 
The prevailing belief, at the time, was that the uncertainty principle was a problem 
for computation. Indeed, there were prevalent casual assertions (see Ref. [12] for a 
listing of some of these) that the uncertainty principle specified a minimal energy 
expenditure required by a fast switching event. 


The notion of reversible computation arose out of discussions concerned with 
purely conceptual questions. Eventually, however, it was realized that the energy 
stored in capacitors in CMOS logic circuits need not be discarded, but could - to a 
large extent - be returned to a suitably designed power supply [13]. An early version 
of this approach, hot-clocking, generally ascribed to Feynman’s Caltech colleague 
Charles Seitz, is described in Chapter 7 of Feynman’s Lectures on Computation [14]. 


In a book closely related to Feynman, it is appropriate to comment on his role 
in understanding quantum mechanical computers. Ref. [15] was generated in con- 
nection with, and after a 1981 conference, where Benioff was also present, and 
where Benioff presented his emerging notions about a totally quantum mechanical 
computational process. Quite independently, Feynman’s paper pointed to the dif- 
ficulty that classical computers had in following events in the large Hilbert space 
of quantum mechanical time evolution. Feynman appreciated the greater power of 
a quantum computer, but provided no details suggesting how that might be ac- 
complished. Later on, Feynman [16] enlarged on the work of Benioff and supplied 
his own very appealing and useful description of a quantum computer, without cit- 
ing Benioff. Feynman’s computer is not clocked; there is no external intervention. 
The computation proceeds under its initial kinetic energy, and is launched onto its 
computational track much like an electron wave packet can be sent down along a 
one-dimensional periodic lattice. Remarkably, Feynman never put his two contri- 
butions, [15] and [16], together. He, thus, failed to anticipate Deutsch’s invention 
of quantum parallelism (7]. Quite possibly that was obvious to Feynman, and the 
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connection was not pointed out because Feynman gave only very informal lectures, 
leaving it to others to record these in written form. Incidentally, my own discus- 
sion in Sec.4 of Ref. [12], triggered by Benioff’s work, anticipated in a crude way, 
the Feynman computer [16]. Unfortunately, it was a somewhat incorrect anticipa- 
tion, assuming that the variable data structure during the computational progress, 
could cause quantum mechanical reflection, i.e. reversal of the computation. In a 
properly designed perfect computer it need not do so. The subject of reflections 
was eventually revisited by Benioff [17]. While Feynman has been influential in 
the development of our understanding of quantum computation, his teacher, John 
Wheeler, has had an equally significant role, even if a somewhat more indirect one. 
Wheeler’s papers will be mentioned in Sec.8.5. But Wheeler’s influence via his 
students, associates, and active participation in conferences, has helped to shape 
the field. 


In Ref. [18] John Swanson asked how much memory can be obtained from a 
given quantity of storage material. He had the typical cooperative phenomena such 
as ferromagnetism, ferroelectricity and superconductivity in mind. The theory was 
not totally independent of model details; his conclusions do not have universal 
applicability. If we take a limited amount of material and use it to make a few large 
bistable memory elements, they will be very immune to decay by thermal activation 
or tunneling, but will not provide a great deal of storage. If we make very small 
elements, we have a great deal of storage initially. But we lose it very quickly, 
even if we allow redundancy, as invoked by Swanson, to protect against errors. 
Redundancy is effective against small error probabilities, but not very useful once 
most elements have already lost their initial status. Thus, there is an intermediate 
size for memory elements which optimizes the number of stored bits. The optimum 
size is one where the individual storage element is relatively reliable. 


Swanson’s calculation did not describe fundamental and unavoidable limits; his 
result depends on device details and has two further shortcomings. He paid no 
attention to the machinery addressing the bits. Furthermore, the calculation had 
to assume that there was a given period, J, over which information had to be 
preserved. This paper was written long before modern random access dynamic 
memory was invented. (Memories with regular refresh operation, e.g. delay lines, 
did exist well before Ref. [18].) The calculation did not allow for frequent inter- 
mediate refresh operations, which read out the bits before too many have changed, 
and reset them all, accordingly. Indeed, the resetting can be done at two different 
levels. First of all, at a systems level, invoking redundancy, to reset those memory 
elements which have become erroneous. But it can also be done at the level of the 
individual element, before it has drifted too far from its intended state. Indeed, in 
the case of tunneling, we now understand that measurements not only allow reset- 
ting; they actually slow down the tunneling process [19]. If the information loss 
occurs by thermal noise, or other noise sources, and information is held in a local 
state of stability, a similar reduction can be achieved. But in a classical system, 
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separate measurement and resetting operations are needed. In such a system the 
two local states of stability, denoting 0 and 1, respectively, will typically lie in po- 
tential valleys separated by a saddle point, over which escape from one state to the 
other occurs. A bit on its way to the saddle point, as a result of earlier noise pulses, 
can be restored to its original valley, through a measurement which determines on 
which side of the saddle point the particle is located, followed by a return to the 
appropriate valley. Whether we actually deal with potential valleys, or with more 
active dynamic systems which have local states of stability, does not matter. 


Despite its limits, Swanson’s paper was perceptive and pioneering in other ways. 
It was, very likely, the first broad perception of the use of redundancy in memory, 
even though its practical use was already understood at that time. The paper was 
certainly the first place that tunneling, as a source of information loss for a bit 
utilizing more than one particle or crystal cell, was considered. Tunneling of a 
small ferromagnet from one state of stability to another has become a subject of 
active concern in the last decade, associated with the label Macroscopic Quantum 
Tunneling. This literature rarely acknowledges Swanson’s work. Swanson’s work 
stems from the same period as Feynman's [20] famous: There’s plenty of room at the 
bottom. Feynman’s paper, with its proposal of small machines making still smaller 
machines, was that of a supremely gifted visionary and amateur; Swanson’s that of a 
professional in the field. Feynman, for example, foresaw the entire Caltech library of 
120,000 volumes recorded on just one library card, ten years after his paper. Instead, 
since the time of Feynman’s paper, most university libraries have expanded their 
space. A large library on a card poses a number of secondary problems. How many 
users can have access to it simultaneously? Can it be updated every day as new 
acquisitions appear? 


Swanson’s paper [18] appeared after his untimely death in a diving accident, and 
I prepared it for publication. Swanson, in collaboration with me, also addressed the 
escape rate from a metastable state, activated by thermal equilibrium noise [21], 
generalizing an earlier one-dimensional case treated by Kramers [22]. John Swanson 
saw the possibility of this theory; the details were left to me. This paper went 
only modestly beyond an earlier one by Brinkman [23], whose work, unfortunately, 
was not known to us at the time. This process of unintended rediscovery was to 
repeat itself several more times. The most significant of these later contributions is 
recorded in [24]. 


Bistable information holding systems need not, like ferromagnets, be dissipa- 
tionless systems in their steady information holding state. They can be active 
dissipative circuits built out of relays, vacuum tubes, or transistors. How small 
can these be and still hold information effectively, protected against fluctuations? 
Ref. [25] answered this for bistable Esaki diode circuits. Even more than Swanson’s 
memory element theory [18], this was a specialized theory, far from universal appli- 
cability. Nevertheless, it was a prototype of a calculation which started to become 
fashionable in areas unrelated to computation, about a dozen years later. For a 
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history of that field see Ref. [26]. 


Science is very much a matter of fashionability. Some topics come into public 
focus, others are ignored. The track started by Refs. [18] and [25] never really caught 
on. Concern with the interface of physics and computation, at a fundamental level, 
has achieved visibility [27]. At the same time, concern with the noise activated 
escape from a state of local stability is also a subject of widespread concern [28]. 
We hope that the intersection of these fields will be rediscovered. At a minimum 
the extension of Swanson’s theory, sketched qualitatively above, needs to be stated 
more precisely. 


8.3 Energy Requirements of the Communications Channel 


This subject, like almost all branches of our field, has had a convoluted history, 
which I will not recapitulate in detail. The fact that in practice electromagnetic 
waves are used has misled many to assume that is essential, and to go even further 
and assume that the energy in the signal has to be dissipated. Actually, of course, 
we can use mail or floppy disks, and have many other ways to ship information. 


Essentially no energy has to be consumed to send a bit. That answer, for classical 
bits sent by classical machinery, was given a decade ago [29, 30]. The answer for 
classical bits sent by quantum mechanical machinery was given more recently [31], 
and after that for quantum bits (qubits) [32]. 


The quantum communications channel has received a great deal of attention, and 
we cite here only the most recent investigations [33]. But that work, despite its very 
real contribution and importance, fails to ask about ultimate unavoidable energy 
requirements. Actually, if we are interested in minimizing energy requirements, we 
may well want to avoid the quantum channel altogether. For a given signal energy 
(ignoring for the moment whether it needs to be consumed) we can send more bits 
if we divide the energy between several channels [34]. That requires a lower bit rate 
in each channel, and therefore the quanta associated with the message have a lower 
energy. If we go far enough in that direction, then the channels become classical 
where kT, rather than hw, matters. 


This author’s proposals for low energy communication have all been of a concep- 
tual nature, and very far from practicality. Are there more practical embodiments? 
Even if there are, is energy so important that it is worth going to the extra com- 
plexity that is likely to be involved? The history of reversible computation provides 
an interesting lesson in this context. It, too, was originally a purely conceptual 
innovation. Nevertheless, as already stated in Sec. 8.1, closely related proposals for 
saving energy in CMOS logic circuits have appeared [13]. These proposals were 
not oriented at saving an energy of order kT per step, but only at cutting down 
on some of the much larger power requirements of real circuits. The real utility 
of these proposals is still far from clear. Perhaps, the communications channel can 
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Fig. 8.1. Time-dependent potential well going from single minimum at A to a deeply 
bistable state at F, and later returning to A. The curves are displaced vertically relative 
to one another for clarity. The variable g gives the position of the particle in the well. 


follow a similar history. 


It is also worth noting that all of the conceptual proposals for low energy com- 
munication channels put forth, so far, involve mechanical motion. Can it, possibly 
be done by invoking nonlinear photon interactions? There, too, we have a historical 
precedent. For a number of years all of the detailed embodiments of apparatus for 
reversible computation were mechanical or chemical in form. ] asked myself whether 
there were deep reasons for the absence of electrical versions. But Likharev [35] in- 


vented a Josephson junction version; there were no deep reasons, only a lack of 
invention. 


8.4 Time-Modulated Potential 


Particles in a classical time-dependent potential going from a monostable state to a 
deeply bistable state and back to the monostable state can be used to carry out logic. 
The time-dependent potential is illustrated in Fig. 8.1, and will be taken as heavily 
damped. A particle in the well, approaching the bifurcation point (where the well is 
flat, as in curve C of Fig. 8.1) from the monostable state, is easily influenced, pushing 
it toward one of the two developing pockets or the other. The biasing influence 
comes from other particles which are already locked into the deeply bistable state 
and coupled to the one under consideration. The time dependence of the potential, 
per se, is not a source of dissipation. That can only come from motion of the 
particle against the viscous forces associated with the potential. We take these to 
be proportional] to the particle velocity. If the particle’s motion is kept slow, then 
very little energy is consumed per event. The time-dependent potential can be 
generated, for example in the case of a charged particle, by moving charges to or 
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away from the information bearing particle. I have invoked this approach repeatedly 
in my papers. For example, a linear sequence of such wells, each influencing the next, 
can be a communication channel with arbitrarily little dissipation per transmitted 
bit [30]. The method is not immune to noise-induced errors, but by proper “design” 
choices; these can be made as small as desired. 


The time-modulated potential approach is not my invention; its history will be 
described. I have, however, invoked it more often than anyone else. Despite the fact 
that a long chain of investigators, including von Neumann and Feynman, have paid 
attention to this method, it has received amazingly little general recognition. You 
will not find it in textbooks or broad review articles. The notion stems from simulta- 
neous and independent inventions by von Neumann [36] and Goto [37]. These come 
from a time when junction transistors were limited to low speeds and alternative 
high speed logic approaches were sought. The inventions use the development of a 
subharmonic through parametric excitation of a non-linear resonant circuit, driven 
at twice its approximate resonant frequency. This is a bistable system; the sub- 
harmonic can develop with two possible phases, 180° apart. Attempts to develop 
this approach took place at several different U. S. laboratories. In Japan actual 
working computers using the non-linear susceptibility of magnetic cores were built. 
The basic tool for this approach is majority logic. Three input circuits influence 
a subsequent one, which is controlled by the majority phase of its three inputs. 
The approach is beset by two problems when considered as a serious technological 
candidate. First of all, majority logic is demanding on tolerances, i.e. on the devia- 
tion allowed for signals from their supposed ideal value. Furthermore, the approach 
requires a precise clock signal delivered to every stage. Logic proposals which need 
to clock every stage have never been successful. Goto, subsequently, adapted the 
approach to Esaki diode circuits [38]. 


The invention was adapted to time-dependent potentials by Keyes and Landauer 
{39], but at a time when reversible computation was not yet understood. The fact 
that the approach could be used in reversible computation, and could be carried out 
with Josephson junction circuits, was pointed out by Likharev [35]. There has been 
some tendency to think of reversible computation as “Brownian” computers, i.e. 
computers which move back and forth diffusively along their trajectory, but with a 
net drift velocity due to an applied driving force. Likharev’s invention showed that 
reversible computers could be clocked, moving forward at a steady and predictable 
rate. 


Bennett used the approach to point out that copying can be done reversibly, 
with arbitrarily little energy dissipation [40]. In Bennett’s case a ferromagnet was 
taken up through its Curie point, and then taken back down under the influence of 
another magnetized bit. Feynman, in connection with Figs.5.15 and 5.16 of Chap- 
ter 5 in Ref. [14] once again explains the copying process, invoking time-modulated 
potentials, with reference to Bennett. Merkle [41] used the approach for mechan- 
ically buckled cards, under an oscillating compressive force, bending out one way 


INFORMATION IS INEVITABLY PHYSICAL 85 


or the other. Recently, Lent and Tougaw [43] used the method in connection with 
their Quantum Cellular Automata, which have two electrons at opposite corners of 
a square. The field due to one of these bistable cells polarizes an adjacent cell. In 
the most recent version of this approach the tunneling barriers, which allow elec- 
trons to move around the square, are externally controlled. Despite the utilization 
of tunneling in the proposal of Ref. [43], the time-modulated potential scheme is 
intrinsically a dissipative scheme, invoking relaxation to the ground state as the 
bifurcation develops. It is not directly suitable for totally coherent quantum me- 
chanical computation. That can be recognized from the majority logic operation, 
which is not 1:1, whereas a Hamiltonian method must be 1:1. A totally coherent 
quantum mechanical proposal for controlled tunneling has been put forth [32], but 
is very different in character. As is true of all the existing quantum logic propos- 
als, it does not provide exactly the required unitary transformation, and does not 
provide any natural error immunity, allowing for small departures from the design 
specifications. Likharev revisited the time-modulated potential with Korotkov in 
Ref. [42]. The shift register proposed there is an example of a reversible commu- 
nications link discussed in Sec. 8.3. As in Ref. [43], the motion from one pocket 
to another involves quantum mechanical tunneling, but the overall process is not 
quantum mechanically coherent. Relaxation to the ground state is essential. 


8.5 Broader Implication of Limits 


How far can technology improve? Limits are of interest as declarations of bound- 
aries for that, or as a declaration of their absence. Computational limits, however, 
also have a more fundamental scientific significance to be discussed in this section. 
The execution of all information handling operations, including all mathematical 
operations, has to take place in our physical world. We have all been indoctrinated 
by the mathematicians, given € IN, stating that with enough successive operations, 
any accuracy requirement can be met. It is questionable whether this can be carried 
out in our real physical world, where it needs to be done. This is a theme which 


I have visited in a number of papers, and here cite only an early [44] and a recent 
item [45]. 


Arbitrary precision requires an unlimited memory, and this is unlikely to exist 
in a finite universe. Even if the universe is unlimited, it seems unlikely that we 
can collect arbitrarily large parts of it into an organized memory structure, and 
even if we grant the availability of unlimited memory, it still seems unlikely that 
we can have an unlimited sequence of operations, each guaranteed to be totally 
free of error. Thus, we are questioning that the mathematicians’ continuum and 
real number system reflect executable operations. That is, of course, not a totally 
unprecedented reaction, and related to many other existing views, though probably 
not exactly equivalent to any of them. Ref. [45] stressed the relationship to John 
Wheeler’s work. Feynman, in Ref. [15] comes close to suggesting that a bounded 
volume of space and time is associated with a limited amount of information, but 
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does not quite say that. It is my recollection that in his actual lecture in 1981 that 
was stated much more clearly and unconditionally. 


I have already alluded to quantum parallelism, utilizing qubits which can be 
in a quantum superposition of a 0 and / state. My published papers give a very 
conservative appraisal of the realizability of quantum parallelism. Nevertheless, I 
welcome it, as a celebration of the physical nature of information. Even if quantum 
parallelism never comes to pass, those who prove theorems about the minimum 
number of steps required by an algorithm, must take its possibility into account. 


What can replace the real number system and allow for the limited precision 
of real physical operations? This author does not know. It does not necessarily 
mean algorithms with a limited number of bits. The limits are likely to appear in a 
more statistical fashion. The laws of physics are algorithms for calculation, and as 
stressed by this author, in their proper form must respect the limited information 
handling capabilities of the real universe. I am often asked: “Does our limited 
ability to describe what the physical world is doing prevent that world from doing 
its own thing more precisely? Isn’t it just our knowledge that is limited?” My 
answer to that: “Behavior which cannot be followed, described, or observed is not 
a matter of science. If I am told that seven angels are on the head of a pin and that 
angels are not detectable, I cannot call that an erroneous assertion. But it is not a 
matter for science.” 


What is the likely impact of limited precision on science? In earlier papers I 
have emphasized that this may be related to the ultimate source of irreversibility 
and fluctuation in the real world, a world where we can readily observe departure 
from Hamiltonian behavior. This is admittedly a very speculative conjecture. But 
limited precision may also underlie the apparent classical behavior manifested by 
the events around us. There are a great many theories which claim to explain 
that [46], and we can cite only a few, including some skeptical reactions. Some 
of these explanations may well be correct, and there may not be a need to say 
more. Nevertheless, a totally quantum mechanical behavior in systems with some 
complexity and followed for some time, requires a precise evaluation of phases for 
the competing histories, for the competing Feynman paths. In a world with limited 
precision relative phases will, eventually, get lost and this will lead to classical 
behavior. As in the discussion of noise and irreversibility, the limited precision acts 
as if the universe had an unpredictable environment with which it interacts. 


At a minimum, however, we caution those who invoke the wave function of the 
whole universe. How can that wave function be recorded, unless you have a second 
and separate universe available for that? 


I am suggesting that, contrary to our prevailing views, the laws of physics did 
not precede the universe and control it, but are part of it. Once again, this is not a 
view totally orthogonal to that of others. Wheeler [47] has stated that the laws of 
physics result from quantum measurement on the universe. More recently, Smolin 
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has suggested that the laws evolved with the universe [48]. Paul Benioff [49] stresses 
that the quantum mechanical physics of our computational apparatus is part of a 
totally consistent picture of physics. Indeed, this interest in self-consistency was, 
apparently, the motivation for Benioff’s original concern with quantum computation 


(6]. 


8.6 Fashions in Science 


Fashions are not totally unreasonable. We cannot each go our separate way; we need 
communities for interaction. After a perceptive pioneer sees a new concept, it takes 
time for the building of a consensus that it is important. Without that consensus we 
cannot expect grants, conference invitations, promotions, or acceptance by Physical 
Review Letters. Fashions, therefore, to some extent, simply represent the fact 
that it takes time to develop the conviction that an area is ripe for exploitation. 
Nevertheless, it seems totally apparent that the positive feedback involved in the 
formation of fashions has gotten out of hand. This paper is not aimed at the 
sociology of science and will not try to analyze the causes, nor the possible and 
unlikely cures. Feynman, of course, was very far from a follower of fashions, in 
science, or in other ways. Indeed, the many Feynman anecdotes picture someone 
who enjoyed an unconventional role. His There’s Plenty of Room at the Bottom [20] 
may not have been a totally correct vision in all its details, but it was hardly typical 
of someone with Feynman’s range of research activities. His later work on quantum 
computation anticipated what was to become a fashionable field. It was far from 
that when he generated Refs. [15] and [16]. 


The physics of computation, viewed at a fundamental level, was an almost in- 
visible field until a 1981 conference at MIT, which included Refs. [12, 15, 35, 40] 
as well as papers by Benioff and Wheeler. Dyson also was a participant, as were a 
number of noted computer scientists. One of these, Konrad Zuse, presented his view 
that the universe is a digital computer [50]; a view also espoused by Ed Fredkin, 
one of the session’s organizers. 


The field as a whole has achieved visibility, without having become a really major 
fashion. But the subfields within the area have not developed equally. Almost all of 
the visibility, in recent years, has come from the study of quantum information, and 
particularly from quantum parallelism. It is a history we have seen elsewhere. Frac- 
tals, chaos, self-organized criticality are just a few examples of fields and concepts 
which received belated recognition through the path-breaking insights of pioneers. 
But then they went on to become industries. Industries which are fueled at the 
expense of other ideas which deserve to be nourished. Not all fashionable fields are 
the outgrowth of key new insights; some represent public relations efforts with little 
substance. 


When we organized the 1981 MIT session, we intentionally encouraged a di- 
verse participation. That, inevitably, brought in some contributions which were 
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recognizably faulty. If we had not done that, it would have been hard to collect 
a reasonably sized group. But additionally, we knew that what was nonsense to 
some would be visionary to others. I cannot help but contrast the diversity of that 
session to some of the many recent sessions which have concentrated on the role of 
quantum mechanical entanglement in information processing. The narrowing repre- 
sents real progress; we now understand more about what counts. But, at the same 
time, the narrowing strengthens fashionability in science. A carefully selected group 
reinforces its existing values, and declares to science journalists that their stuff is 
what really counts. I hope that quantum information can receive the attention it 
deserves, without eclipsing the other questions we have touched in the preceding 
sections. 
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SCALING OF MOS TECHNOLOGY TO 
SUBMICROMETER FEATURE SIZES 


Carver A. Mead * 


Abstract 


Industries based on MOS technology now play a prominent role in the developed and 
the developing world. More importantly, MOS technology drives a large proportion 
of innovation in many technologies. It is likely that the course of technological 
development depends more on the capability of MOS technology than on any other 
technical factor. Therefore, it is worthwhile investigating the nature and limits of 
future improvements to MOS fabrication. The key to improved MOS technology 
is reduction in feature size. Reduction in feature size, and the attendant changes 
in device behaviour, will shape the nature of effective uses of the technology at the 
system level. This paper reviews recent, and historical, data on feature scaling and 
device behavior, and attempts to predict the limits to this scaling. We conclude 
with some remarks on the system-level implications of feature size as the minimum 
size approaches physical limits. 


9.1 Introduction 


It is always difficult to predict the future; few attempts to do so have met with 
resounding success. One remarkable example of successful prediction is the ex- 
ponential increase in complexity of integrated circuits, first noted by Gordon E. 
Moore. As we contemplate the ongoing evolution of this great technology, many 
questions arise: Can the trend continue? Will single-chip systems attain levels of 
complexity that render present system architectures unworkable [1]? Will digital 
techniques completely replace analog methods [2]? The answers to these questions 
depend critically on the properties of the individual transistors that provide the 
essential active functions, without which no interesting system behavior is possible. 
Integrated-circuit density is increased by a reduction in the size of elementary fea- 
tures of the underlying structures; therefore, any discussion of the capabilities of 
future technologies must rely on an understanding of how the properties of transis- 
tors evolve as the transistors’ dimensions are made smaller. 


Elsewhere [3], we described the factors that limit how small an MOS transistor 


*Reproduced from Journal of VLSI Signal Processing, 8, 9-25 (1994) Kluwer Academic Pub- 
lishers, Boston. Manufactured in The Netherlands. 
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can be and still operate properly. That discussion will not be repeated here, but I 
will outline the major issues: 


1. For the device current to be primarily controlled by the gate, the device should 
not be punched through; that is, the sum of the source and drain depletion 
layers should be less than the geometric channel length. As a direct conse- 
quence of this requirement, the bulk doping must increase as dimensions are 
decreased. 


2. Increasing the bulk doping has two important consequences: 


a. Junction breakdown voltage is lowered. 


b. A larger electric field is required in the gate oxide to obtain a given change 
in surface potential. 


Because of 2a, the operating voltage must be reduced. So that sufficient electric 
field can be obtained with a lower operating voltage, the gate oxide must be made 
thinner. Thus, it is inevitable that, as the minification process is continued, both 
drain depletion layer and gate oxide will become thin enough that electron tunneling 
through them will become comparable with other device currents. In 1971, when 
our original study [3] was written, we described a device of 0.15 micrometer (2) 


channel length, having a 50 Angstrom (A) gate oxide. Although we were confident 
that a device of this size could be made to work, we were not at all sure that smaller 
devices could be made viable. 


Over the ensuing 22 years, feature sizes have evolved from 6 to 0.6 uw and the 
trend shows no sign of abating [4-10]. In this paper, I shall examine what we have 
learned from the past 22 years of technology evolution, and shall discuss to what 
extent these same trends may continue into the future. I shall conclude that we can 
safely count on at least one more order of magnitude of scaling, with a concomitant 
increase in both density and performance. Several of the conclusions of this study 
were reached independently by Hu [11]. 


9.2 Scaling Approach 


In Figure 9.1, I have plotted the historic trend of gate-oxide thickness t,., as a 
function of !, the minimum feature size of the process. The trend can be expressed 
accurately as 


tox = 21019:7” 


fe) 
where the feature size is in yz, and the gate-oxide thickness is in A. This observation 
suggests that it may be fruitful to express all important process parameters as 
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Fig. 9.1. Gate-oxide thickness as a function of feature size. The solid circles are production 
processes in silicon-gate technology, starting in 1970. Triangles are processes reported in 
the literature. Solid squares are the two most advanced devices described in our previous 
study [3]. The solid line is the analytic expression used in this study (Equation 9.1). 


powers of the feature size, and to determine whether there is a scaling of this form 
that allows sensible process evolution to dimensions well below 0.1 uu. To prevent 
the gate oxide thickness from becoming thinner than a single atomic layer, I have 
chosen a scaling of the form 


tox = max(2101°"", 1401°-°*) (9.1) 


This expression is plotted as the solid line in Figure 9.1. In reviewing the historic 
trend, it is clear that we expressed previously [3] more concern with gate-oxide 
tunneling than has been justified by the experience accumulated through the inter- 
vening years. It is conceivable that I am repeating the same bit of paranoia here. In 
any case, if oxide thickness continues to decrease at the present rate, the resulting 
devices will be somewhat more capable than those I present. 


The oxide thickness and feature size together determine the gate-oxide capaci- 
tance C’, of a minimum-sized device: 
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Fig. 9.2. Power-supply voltage as a function of feature size. The solid line is the analytic 
expression used in this study (Equation 9.2). 


The historic trend in supply voltage V is shown in Figure 9.2. This trend is not as 
smooth as the trend in oxide thickness, due to the long period of standardization at 
5 volts (V). It is clear, however, that modern submicrometer devices operate better 
on lower voltages [7, 12], and that this trend to lower voltages must continue. The 
scaling I use in this study is 


V=5°% (9.2) 
This expression is plotted as the solid line in Figure 9.2. 


Once we have the gate-oxide capacitance and supply voltage, we can estimate 
the energy W, stored on the gate of a minimum-sized transistor at any given feature 
size. I have slightly overestimated the stored energy as 


W,= =CyV? (9.3) 
For the scaling laws given here, the stored energy (in Joules) works out to be 


W,=22« 10-4? (9.4) 


This expression is plotted as the long solid line in Figure 9.3. Even with the 
slight “kink” introduced by Equation 9.1, this expression is a good abstraction of 
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Fig. 9.3. Energy stored on the gate of a minimum-sized transistor as a function of feature 
size. We compute the points from Equation 9.3 using oxide-thickness values from figure 1 
and the supply-voltage values from figure 2. The solid line is the analytic expression used 
in this study (Equation 9.4). Also shown for reference are the thermal energy kT at room 
temperature, and the quantum-level spacing for electrons in the channel with momenta in 


the direction of current flow. 


the actual energy over the entire range of the plot. In the central section of historic 
data, however, the constant 5-V power-supply voltage has established a trend with 
much less dependence on feature size. 


This shorter trend is well represented by the expression 


Ws x 2 x 107!4]-22 (9.5) 


Also shown for reference on Figure 9.3 is the thermal energy kT, and the spacing 
of levels in the channel with momenta in the direction of current flow. It is clear 
that the stored energy is more than 10 kT even at feature sizes of 0.01y. 


The minimum stored energy is an interesting quantity because it sets the scale 
for the switching energy dissipated in a digital system. The energy per operation of 
computation-intensive digital chips is compared with the minimum stored energy 
in Figure 9.4. The system energy per operation is four to six orders of magnitude 
higher than the minimum stored energy, and can be bounded by the two solid trend 
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Fig. 9.4. Energy dissipated per operation at the chip level. Filled circles are data taken 
from the literature and from manufacturers’ data sheets. Examples are all computation- 
intensive single chips, such as multipliers, digital signal processors, and similar devices. 
So that the data could be plotted on a single scale, all values were normalized to 8 x 
8 multiply-add operations, assuming that the energy is proportional to the product of 
the word lengths of the multiplicand and multiplier. Minimum and maximum trend lines 
shown are Equations 9.5 and 9.6. Also shown for reference are the data of Figure 9.3. 


lines: 


Wax = 1.15 x 1078134 (9.6) 


Winin = 2.5 x 1071018-25 (9.7) 


The overall system trend is steeper than that for minimum stored energy, presum- 
ably because designers have become more skilled over the years, and processes have 
an ever increasing set of features on which designers can draw (multiple levels of 
metal, for example). A 5-V subtrend is clearly discernible in the system data as 
well. 


With the information on hand, we can determine the tunneling current density 
Jox through the gate oxide [13-15], making the worst-case assumption that the 
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Fig. 9.5. Oxide tunneling current as a function of electric field. The open circles are from 
the original work of Lenzlinger and Snow [13]. Filled circles are from the recent work of 
Suné et al. [15]. Filled triangles are from Hori et al. (14]. The solid line is the analytical 
expression used in this study (Equation 9.7). The filled square is inferred from Iwase et 
al. {10}, but is not directly comparable with the other data because it was taken from 
a transistor drain characteristic, and may be corrupted with other effects such as gate- 
enhanced drain tunneling. The gate current was not reported separately, so this value 
shown represents a worst-case estimate. 


entire supply voltage appears across the entire gate area: 
Jox = Jo E2e7 "tox (9.8) 


where Jo = 6.5 x 10!° A/V/cm? was adjusted to match experimental data, as shown 
in Figure 9.5. The imaginary part of the wave vector k is given by 


p= MS [1 (1—min(.5)) "| (29) 


These expressions are valid for voltages both above and below the barrier potential 


oe 
@ which was taken to be 3.2 V. The preexponential constant kj = 1.2 A was 
used. It is comforting to note that oxide tunneling data are available over the entire 
range of electric fields that will be encountered down to the smallest dimensions 
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Fig. 9.6. Substrate doping as a function of feature size. The solid line is the analytical 
expression used in this study (Equation 9.8). Filled triangles represent processes reported 
in the literature. The two solid squares are the two smallest transistor designs shown in 
our earlier work {3}. 


studied here. It will be helpful, however to have actual experimental data in the 10 


A range. For these extremely thin oxides, it will be essential to take into account 
the quantum corrections discussed in Suné et al. [15]. 


The other major source of parasitic current is tunneling through the drain junc- 
tion. The junction-tunneling current density J; is critically dependent on the sub- 
strate acceptor concentration n, which must be increased to avoid punch-through 
as device dimensions are decreased [16-22]. The scaling law used in this study is 
plotted in Figure 9.6: 


n=4x 10'%-18 (9.10) 


Given the doping density n, we can compute the depletion-layer thickness x for any 
potential ~ relative to substrate using the usual step-junction approximation: 


= (— (9.11) 
qm 
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Fig. 9.7. Junction-tunneling current density as a function of peak electric field in the 
junction. The filled triangles are from alloy tunnel diodes, which were reported as step 
junctions by Chynoweth et al. [16]. The filled circles are from diffused emitter-base junc- 
tions reported as graded junctions by Fair and Wivell [19]. These were the only references 
that I was able to locate for electric fields in the range encountered in the finest feature 
sizes considered in this study. Some data are shown by Reisch [22], but not enough infor- 
mation is given to allow direct comparison with the other data. For reference, the solid 
square represents the parameters encountered in the 0.03- device described in this study. 
The solid line is the analytical expression used in this study (Equation 9.10). 


The corresponding depletion-layer capacitance C is given by 


C= Si 
x 


We can determine the maximum electric field in the drain junction, from the junc- 
tion voltage, which in the worst case will be the supply voltage plus the built-in 


voltage: 
n, = [2m +¥) 
Esi 


We could alternatively use a graded-junction approximation, such as that used by 
Fair and Wivell [19]. For our purposes, the two approaches are nearly equivalent, so 
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I have used the simpler step-junction expression with the junction built-in voltage V, 
= 1.1 V. In either case, the tunneling current density is a function of the maximum 
electric field: 


J; = GoV Ger F/E (9.12) 


The constant Ey = 2.9 x 10’ V/cm was taken from Fair and Wivell [19], and the 
preexponential factor GO = 3 x 10° A/V cm? was chosen to fit the experimental data 
plotted in Figure 9.7. It is significant that experimental data exist that allow us to 
predict the tunneling currents in junctions of devices down to 0.03-y feature sizes. 
Previously [3], we pointed out that the “drain corner” tunneling occurs at lower 
voltage than that across the junction area, a fact that has received considerable 
attention [23]. For the present study, I will use Equation 9.10 for area tunneling, 
both for simplicity and because I expect considerable cleverness on the part of 
process designers as this phenomenon becomes limiting. Caution, however, that 
corner effects may significantly increase the drain tunneling over the values shown 
in the following Figures. 


9.3. Threshold Scaling 


To determine the detailed properties of small devices, we must take into account the 
short-channel properties, most notable of which are carrier-velocity saturation and 
drain-induced barrier lowering (the precursor to punch-through). Previously [2], 
we developed a model that gives closed-form expressions for the current in short- 
channel devices, including the effects of velocity saturation. To apply the model, 
we need some abstraction of the vertical doping profile under the gate. The most 
widely used such abstraction is the threshold voltage V;. We therefore proceed by 
choosing a nominal threshold voltage of the form 


V; = 0.551°-3 (9.13) 


The actual threshold voltage will be lower than the nominal one by the amount of 
drain-induced barrier lowering (DIBL) [24-27]. In this study, I use the expression 
given by Fjeldly and Shur [28]: 


asp te sinh(a,/A 
DIBL=V X cosh((l — tq)/A) — cosh(z,/A) 


where x, and Zq are the classical depletion-layer thicknesses of the source and drain 
junctions. I have used a surface potential of 0.5 V in Equation 9.9 to compute Zz, 
the thickness of the depletion layer under the channel. The distance scale A is given 
by 


(9.14) 


Cox " 
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where the depletion-layer capacitance per unit area C’, from channel to substrate is 


and the oxide capacitance per unit area C,, from gate to channel is 


Eox 
Cox ee 
ox 


The nominal threshold voltage; the actual threshold voltage, including DIBL; and 
the supply voltage are plotted as a function of feature size in Figure 9.8. For the 
scaling parameters used in this study, DIBL does not become a serious problem 
until feature sizes are less than 0.03 p. 


9.4 Device Characteristics 


Threshold is defined as the gate voltage at which mobile charge @, at the source 
end of the channel changes the surface potential by k7'/q [2]. The channel charge 
at threshold is 


ove = (Cox + Cs) (9.15) 


For higher gate voltages, essentially all charge on the gate attracts equal and oppo- 
site countercharge of mobile carriers in the channel. Thus, we can form an excellent 
estimate of the channel charge Q, at the source end of the channel: 


Qs = Cox(V - Vi) (9.16) 


For gate voltages below V;, channel current decreases exponentially with decreasing 
gate voltage. At zero gate voltage, the channel charge is: 


Qs = Que Ve /FT (9.17) 
where 
= UX 


Given Q; and @,, we can compute the saturated channel current for a minimum- 
sized transistor of any given channel length using Equation (B.28) from [2]: 


I 29,1 /1 = 
Tet = Qs% + QM (; + 1) ( —4j/it+ = (7 + t) (9.18) 
0 
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Fig. 9.8. Threshold voltage used in this study. The middle curve is the nominal threshold 
voltage, given by Equation 9.11. The bottom curve is the actual threshold voltage, which 
is lowered from the nominal value by drain-induced barrier lowering (DIBL), given by 
Equation 9.12. The top curve is the nominal supply voltage from Equation 9.2. 


where vo, the saturated velocity of electrons in silicon, is taken to be 10’ cm/s (29), 
and lo = D/vp can be thought of as the mean free path of the carrier, which is 
taken to be 0.007 y [2]. 


We obtain the threshold current J; by substituting Q, = Q; from Equation 9.13 
into Equation 9.16. We obtain the on current J, by substituting Q, from Equa- 
tion 9.14 into Equation 9.16, using the threshold voltage lowered only by the built-in 
junction voltage, rather than by the total junction voltage. We obtain the off cur- 
rent (15) Jog by substituting Q, from Equation 9.15 into Equation 9.16, using the 
threshold voltage as lowered by DIBL. These expressions thus represent a conser- 
vative characterization of the transistor performance, since the on current will be 
somewhat underestimated. 


The several currents associated with a minimum-sized transistor are shown as a 
function of feature size in Figure 9.9. The trade-offs mentioned in the introduction 
are immediately apparent in this plot. As features become smaller, substrate doping 
must increase to prevent punch-through. The increase in substrate doping increases 
the junction electric field, thereby increasing drain-junction tunneling current into 
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the substrate. To limit the tunneling current to a reasonable value, we reduce 
the supply voltage, thereby reducing the ratio of channel on current to channel off 
current. The most remarkable conclusion from Figure 9.9 is that transistors of 0.03- 
p channel length still function essentially as do present-day devices. With proper 
scaling of all parameters of the process, device miniaturization is alive and well. 
Many issues will arise in the development of ever-finer-scale fabrication, but, in the 
end, the endeavor will prevail. 


Given that devices at least one order of magnitude smaller than today’s are 
feasible, we may enquire what their characteristics may be. Figure 9.10 shows 
several quantities of interest. It is clear that discreteness of all quantities will become 
increasingly important at smaller feature sizes — particularly that of doping ions in 
the substrate. We have given elsewhere a simple discussion of the effects of discrete 
substrate charge [3]; a recent analysis is presented by Nishinohara et al. [30]. 


Perhaps the single most important aspect of device performance is the speed 
of logic fabricated from any particular technology. We can estimate the time 7 
required for an elementary logic element to drive another like it: 


_ VCtot 
T= 


9.19 
Tn (9.19) 


where the total capacitance Cio4 is taken to be three times the sum of the oxide 
and drain junction capacitances. This delay should correspond rather directly to 
the delay per stage measured for ring oscillators in any given process, and is plotted 
along with several experimental points in Figure 9.11. It is remarkable that, despite 
the reduction in supply voltage at small feature sizes, logic performance continues 
to improve. Several authors have emphasized the improvement in speed that we 
can make available by reducing threshold and power-supply voltages [12, 31-33]. 


The primary effect behind this somewhat counterintuitive trend is velocity satu- 
ration, an excellent recent account of which can be found in Noor Mohammad [29]. 
We gave an early treatment of the effect of velocity saturation on device character- 
istics [34]; an extended analysis appears in Appendix B of a previous work [2]. 


The supply voltage V affects the performance of standard CMOS digital logic 
in three ways: 


1. The channel charge is proportional to V ~ Vy. 
2. The electric field in the channel is proportional to V. 


3. The logic swing is proportional to V. 


For long-channel devices, the carrier velocity is proportional to the electric field 
in the channel. The channel current is the product of the channel charge and the 
carrier velocity. Therefore, the device current has a quadratic dependence on the 
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Fig. 9.9. Currents characteristic of minimum-sized devices as a function of feature size. 
We obtain the threshold current /; by substituting Q,; = Q: from Equation 9.13 into 
Equation 9.16. We obtain the on current Jon by substituting Q, from Equation 9.14 into 
Equation 9.16, using the threshold voltage lowered only by the built-in junction voltage, 
rather than by the total junction voltage. We obtain the off current loff by substituting 
Q; from Equation 9.15 into Equation 9.16, using the threshold voltage as lowered by the 
full supply voltage. The junction tunneling current was computed from Equation 9.10, 
assuming the drain area is the square of the feature size. The gate-oxide tunneling current 
was computed from Equation 9.7, assuming that the full supply voltage is present across 
the full gate area (the square of the feature size). 


supply voltage. This current must charge the load capacitance to approximately 
one-half of the supply voltage to achieve a logic transition. This factor cancels one 
of the V terms in the current, leaving the circuit speed linear in the supply voltage. 


Once the carrier velocity is saturated, however, increasing the electric field in 
the channel no longer increases the channel current. Both the charge in transit and 
the voltage to be traversed by the output are increased by the same factor. In this 
regime, the only effect of increased supply voltage is an increase in the switching 
energy, with virtually no increase in performance. Just how close devices of the 
present day come to this limit can be seen in the delay-versus-voltage plots in the 
recent literature; see, for example, [6, 10, 14]. 
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Fig. 9.10. Number of signal levels resolvable by a minimum-sized device according to 
the scaling laws used in this study. Thermal noise limits the analog depth representable 
by a single voltage. The number of voltage levels above thermal noise was taken to be 
the square root of the minimum stored energy shown in figure 3, expressed in units of 
kT. The quantum-level separation was taken to be the energy spacing of states in a one- 
dimensional box of length ! — z, — zg. The number of electrons under the gate was taken 
to be the on-value of Qs multiplied by the gate area (a slight overestimate). The number 
of depletion ions was taken to be the doping density n given by Equation 9.8, multiplied 
by the gate area and the depletion depth z from Equation 9.9, using 1 V for w. As the 
number of depletion ions becomes smaller, the range of threshold voltages encountered 
across a Single chip increases. In analog systems, adaptation techniques can mitigate or 
eliminate the variation among transistors. 


108 CARVER A. MEAD 


Minimum Inverter 


197° 


Delay (sec) 


to”! 


107 10° 10° 


Feature Size (a) 


Fig, 9.11. Delay of minimally loaded inverter as a function of feature size. Filled triangles 
are experimental results from ring oscillators reported in the literature. Solid line is the 
expression given in Equation 9.17. 


Because we have at our disposal the currents associated with all terminals of the 
transistor, we can evaluate the conductances associated with these currents. For 
logic devices to function properly, it is necessary that an elementary logic circuit 
have a gain greater than unity, which in turn requires that the transconductance Gm 
of the transistor be larger than the sum of all contributions to the drain conductance. 
As feature size decreases below 0.1 uu, both DIBL and drain-junction tunneling 
make rapidly increasing contributions to the drain conductance, as can be seen in 
Figure 9.12. In spite of these parasitic effects, the device is still capable of providing 
greater than unity gain down to the smallest feature sizes investigated. 


9.5 System Properties 


The enormous effect of device scaling on computational capability becomes apparent 
only when viewed from the system level. We can estimate the system-level capa- 
bilities of digital chips fabricated with advanced processes by extrapolation from 
present-day systems. The first such extrapolation is the number of devices per unit 
area. If every transistor in a modern digital chip were to be shrunk to minimum 
size, the entire active area would cover approximately 2% of the chip area. If we 
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assume that this coverage factor can be maintained in future designs, the density of 
active elements scales with feature size, as shown in Figure 9.13. The system clock 
period in today’s processors is approximately 1007. Even today, it is becoming more 
economical to break each chip into several processors that can operate in parallel, 
than it is to merely build larger “dinosaur” processors. For purposes of extrapola- 
tion, we can assume that each processor contains 106 transistors. The computation 
available under these clearly oversimplified assumptions is plotted versus feature 
size in Figure 9.14. If we further assume that all devices are in fact of minimum 
size, and that they are clocked at the system-clock frequency, we can estimate the 
power that will be dissipated by chips built in these advanced technologies. The 
power attributable to useful switching, and the dissipations of various parasitic cur- 
rents that do not depend on clock speed, are shown in Figure 9.15. Down to about 
0.03 uw feature size, most of the energy supplied to the chip is dissipated in real, 
useful computation. Only below this scale do the parasitic currents overwhelm the 
energy consumed in performing real computation. 


9.6 Conclusions 


The MOS transistor has become the workhorse of modern microelectronics; it has 
survived many generations of process scaling to finer feature sizes. In this study, 
I have explored the extent to which the MOS device, as we know it today, can be 
scaled to still smaller dimensions. We have data available to provide experimental 
support for the tunneling currents that will be encountered in the heavily doped 
source and drain junctions of devices down to 0.03 yw. Neither do we have com- 


parable data to support the theory for oxides in the 10 A range, nor do we have 
direct experimental verification of the effect of statistical fluctuations on very small 
structures built in heavily doped material. As such data become available, we will 
be better able to chart the course of future minification, of which the present study 
is only an outline. It is already clear that MOS circuits will be integrated to upward 
of 10° devices per square centimeter merely by scaling, without any major change in 
the conceptual framework that we use today. There are many challenges involved in 
this technology evolution [4], but I do not expect any show-stoppers. The prospect 
of very high levels of integration was daunting in 1971 when our earlier study was 
written, and is far more daunting today. Whereas massive parallelism is possible in 
present-day technology, it will clearly become mandatory if we are to realize even 
a fraction of the potential of more highly evolved technology. Even as this study 
is written, there is far more potential in a square centimeter of silicon than we 
have developed the paradigms to use, as has often been the case in periods of rapid 
technological evolution. 


I should clarify the “limits” considered in this study. It is clear that devices 
much smaller than those treated here can be made to show useful characteristics. 
Conventional MOS devices can be fabricated on insulating substrates (SOI-SOS), 
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Fig. 9.12. Several conductances associated with minimum-sized transistors, as a function 
of feature size. The top curve is the transconductance. The filled triangles are experimen- 
tal values given in the literature, normalized to a minimum-sized device at the reported 
dimension. The second curve is the drain conductance due to DIBL, computed by evalu- 
ating Equation 9.16 at a drain voltage equal to V and at 0.9 V, and dividing the difference 
by 0.1 V. The current through this conductance flows from drain to source. The bottom 
curve is the drain conductance due to drain junction tunneling. Current through this 
conductance flows from drain to substrate. 


thereby removing the constraint imposed by substrate tunneling. Much smaller 
devices are possible at molecular scale. The most obvious example of an extremely 
small device is an electron-transfer reaction occurring along an amino acid path, 
the potential of which is determined by the charge on a nearby atomic site. Such 
arrangements are thought to occur in many biological systems. The physics of such 
a transfer corresponds directly to that of an MOS transistor operating in weak 
inversion (below threshold). Imagining a device that functions is easy; building a 
device that works is much harder; and having a process by which billions of devices 
can be constructed in a single physical structure is many orders of magnitude harder 
still. I have limited this study to the consideration of direct extensions to existing 
technology. 


Finally, I emphasize that I have considered only the properties of transistors 
themselves, and have not even touched many other important aspects of the tech- 
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Fig. 9.13. Assumed number of active devices per square centimeter of chip area. If all 
devices are of minimum size, active (transistor channel) area is 2% of total area. 


nology. Of the latter, interconnect — both within a single chip and across chip 
boundaries — is obviously a key concern. We have given elsewhere preliminary 
discussion of the global scaling properties of a single-chip interconnect network for 
ultradense technology [1]. The topic of interconnect, along with many other issues, 
such as the fabrication technology itself, deserve a great deal of consideration as 
the technology evolves. Whatever complications arise, however, it is clear that the 
technology will evolve. It will evolve because that evolution is possible, because 
there is so much to be gained at the system level by that evolution, and because 
the same energy and will on the part of bright, energetic, devoted people that has 
overcome enormous obstacles in the past will overcome those that lie ahead. 
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RICHARD FEYNMAN AND CELLULAR 
VACUUM 


Marvin Minsky 


10.1 Richard Feynman 


He was not only a physicist — but also a great psychologist. Whatever Feynman 
thought about, he reflected upon his thinking, too — about how his mind computed 
itself. His explanations more than explained the subjects he was talking about. 
They also made you think of new ways for you to make yourself think about things. 


Feynman loved computation. He loved every aspect of it: Algorithms, computing 
machines, abstract theories about computations, and all. I had an old Marchant 
Calculator in my basement. Richard and I tried to fix it once, but never got it 
unjammed. I was somewhat annoyed, but Richard thought this was very funny. Of 
course, he thought everything was funny. Especially, he liked to discover novel ways 
to simplify calculations. 


Why did he like computation so much? Partly of course, he liked everything 
that had deep intellectual content. (And no one ever more despised ideas that only 
pretended to depth.) In particular, though, it usually was computation that showed 
the relations between a theory and an experiment. One thing that drew us together 
was a shared interest in extending the concept of a calculating machine. Earlier in 
my career, I had explored and promoted the idea that we could program comput- 
ers, not merely to calculate numerically, but to perform “symbolic computations” 
— that is, to manipulate not only the numbers but the symbols of mathematical 
expressions. This led to the first programs that could compute formal derivatives, 
integrals, and finally, formal solutions to differential equations. This was the sort 
of thing that Feynman needed for expanding the symbolic power series that arise in 
QED. When large such expressions were expanded and then ‘simplified’ by humans, 
you could never tell when there might have been a symbolic (rather than numerical) 
error. 


Would computers eventually also be able to “compute” good new ideas? Cer- 
tainly, Feynman seemed sure of that; if we knew enough about how minds/brains 
work, then surely we could simulate them. The trouble is that we still don’t know 
much about such things. None of us are conscious yet — in the sense of understand- 
ing ourselves enough to have good ideas about how we think. (In that particular 
sense Richard Feynman was perhaps the most conscious person I’ve met.) Once 
Feynman showed me an impressive trick. He asked how many people were in the 
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room — and before | could barely start to count he said that there were thirty-one. 
Here’s how he did it: Without ceasing to speak, he had scanned the room, while 
thinking a rhythm of visual groups — three, three, three one. This enabled him 
to count ten people per second, so at the end of three seconds he’d counted three 
tens and there was only one person left. He said he did it by using his bongo-drum 
expertise. 


10.2 Prospects of a Final Theory 


In one of his lectures, Feynman asked whether physics would always continue to 
evolve, with new “fundamental” laws — or, as he preferred to say — corrections 
to deficiencies in earlier discovered laws? Will we always find new phenomena that 
are not explained by our old ideas? Or will there someday come a time when the 
physicists finally find themselves with a theory that would never again need to be 
changed or augmented? 


I’ve asked many physicists whether they expected we would ever find such a 
“theory of everything”. Most of them seemed inclined to assume that there always 
would be new theories to come, perhaps for increasingly high energies. Only a few 
preferred the view that some final theory would be found. As for Feynman’s opinion, 
I didn’t make notes, but what I recall is that he saw no reason why physics should 
perpetually change. Of course there was no way to prove that wrong and (as he 
said in “Character” [1]) the only thing we can be sure of is that we can’t be sure of 
anything. In particular, even if we did reach a final theory, we’d have no way to be 
sure that we’d found it — so one must always be open to new evidence. However, 
he cautioned against worrying about this too much, lest it promote a discouraged 
attitude. 


Some of those physicists did not even like the prospect of a final theory. “That 
would be a terrible thing, because then there would be no mysteries left!” But 
Feynman had no patience for the idea that mysteries ought to be preserved, like 
endangered species. After all, even if you had all the basic laws, you’d still need 
new ways to compute their implications. The last time I saw him he was trying to 
find better ways to compute the predictions of QCD. 


10.3 Locally finite information theories 


What could be a possible form of such a final theory? Here’s one idea that spread 
around in the 1950s, along with the emergence of computer science. Consider that 
most of the deficiencies of classical physics were related to extremes of size and 
speed. I] recall several friends suggesting that this could be because the universe is 
“supposed to be” Newtonian — but, because it is actually only being simulated on 
some very large computer, there are some problems of precision. Perhaps, for very 
small quantities, it is round-off errors in the lowest bits that give rise to quantum 
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phenomena. As for larger, more cosmological quantities, perhaps it is the limited 
word-length that prohibits transluminal velocities. This fantasy does not work 
well when one tries to fill in more details. However, it does suggest some other 
ideas that seem to be more promising: What if space-time is “discrete” — that 
is, composed of separate points, with no continuum between them? Feynman did 
like that idea because he felt that there might be something wrong with the old 
concept of continuous functions. How could there possibly be an infinite amount of 
information in any finite volume? Why do all those new particles appear when we 
try to put too much energy in one place? Could that be because there just isn’t 
room for that much information? 


Ed Fredkin pursued the idea that information must be finite in density. One 
day, he announced that things must be even more simple than that. He said that he 
was going to assume that information itself is conserved. “You’re out of your mind, 
Ed,” I pronounced. “That’s completely ridiculous. Nothing could happen in such 
a world. There couldn’t even be logical gates. No decisions could ever be made.” 
But when Fredkin gets one of his ideas, he’s quite immune to objections like that; 
indeed, they fuel him with energy. Soon he went on to assume that information 
processing must also be reversible — and invented what’s now called the Fredkin 
gate. 


When Bennett and Landauer wrote their paper showing that reversible, non- 
dissipative computation was indeed possible in principle, I was asked to be a referee 
for the journal they had submitted it to. I read the paper over and over — every 
day for a solid month. Finally I sent a note to the journal. “This result just doesn’t 
seem possible. However, I have read it very carefully, and cannot find where might 
be the mistake. I suppose you’ll just have to publish it!” 


Feynman too was skeptical. However, instead of merely checking their proof, 
he set out to prove it for himself. He came up with a different and simpler proof, 
which convinced many others that the discovery was believable. Soon he started to 
design the first quantum computers. 


10.4 The Idolatry of Uncertainty 


For generations the public has been told that modern physics has changed our view 
of the universe. Our teachers tell our children that “In the world of classical mechan- 
ics, everything worked like clockwork, with deterministic certainty. But Quantum 
Theory has shown us that things are indeterminate. The mechanical, deterministic 
world of Newton has been replaced by one in which everything is uncertain and 
unpredictable.” This view has become quite popular — but it actually puts things 
upside down. Uncertainty lay in the classical view — and it was quantum theory 
that actually showed why things could be depended on. It is true that Newton’s 
laws were replaced by a scheme in which such quantities as place and time are sep- 
arately indeterminate. But the implications of this are not what they seem — but 


120 MARVIN MINSKY 


almost exactly the opposite. For it was the planetary orbits of classical mechanics 
that were truly undependable, because of chaotic interactions. In contrast, the “or- 
bits” of electrons in atoms, according to quantum mechanics, are extremely stable 
— and it is these that enable us to have certainty! 


To explain this seeming paradox, let’s contrast two systems. The first is a 
classical Newtonian solar system that has a heavy object in the middle, and several 
planets surrounding it. In the particular case of the Solar System in which we 
live, Gerald Sussman and Jack Wisdom [2] have shown that the orbit of Pluto 
is chaotic. Eventually, Pluto may eventually be hurled out of the solar system. 
We earth-people might not consider this a serious loss. However, our large outer 
planets have more than enough angular momentum that, given suitable coupling, 
they could also throw Earth itself into outer space. (This should not be our chief 
concern, however, because simulations appear to show that it won’t happen before 
the Sun itself becomes a red giant.) Solar systems are unstable. So also would be 
molecules if their atoms behaved in accord with classical laws. Even if each atom 
were stable by itself, when they approach one another, the electron orbits inside 
them would soon be perturbed, and one or both atoms would soon break up. As 
Feynman said in his 1965 Lectures on Physics, “It is true classically that if we 
knew the position and velocity of every particle in the world, or in a box of gas, 
we could predict exactly what would happen. And therefore the classical world is 
deterministic. Suppose, however, that we have a finite accuracy and do not know 
exactly where just one atom is, say to one part in a billion. Then as it goes along 
it hits another atom, and because we did not know the position better than to one 
part in a billion, we find an even larger error in the position after the collision. And 
that is amplified, of course, in the next collision, so that if we start with only a tiny 
error it rapidly magnifies to a very great uncertainty” [3]. 


The Newtonian world was inadequate. If the atoms of our universe moved only 
according to Newton’s laws, there could exist no molecules, but only drifting, fea- 
tureless clouds. In contrast, chemical atoms are actually extremely stable because 
their electrons are constrained by quantum laws to occupy only certain separate lev- 
els of energy and momentum. Furthermore, combinations of atoms can combine to 
form configurations, called molecules, that are also confined to have definite states. 
Although the internal state of a molecule can change suddenly and unpredictably, 
such events may not happen for billions of years — during which there is abso- 
lutely no change at all. Our stability comes from those quantum fields, by which 
everything is locked into place, except during moments of clean, sudden change. 


In contrast, consider what happens in quantum theory, where each electron 
level remains unchanged until there occurs a transition jump. The result is that 
we can have molecules with covalent bonds, which can remain stable for billions of 
years. Thus, contrary to what our teachers say, it was in that classical world that 
everything was unstable and indeterminate, whereas it is those stable quantum levels 
that make possible chemistry, life, and nanotechnology. It is because of quantum 
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states that you can remember what you had for breakfast; this is because new neural 
connections made in your brain can persist throughout your day. When something 
more important occurs, you'll remember it as long as you live. Everything we can 
know depends on that “Quantum Certainty.” 


10.5 Quantum Psychology 


Early in this century, some physicists began to speculate that the uncertainty prin- 
ciple of quantum mechanics left room for the freedom of will. What attracted those 
physicists to such views? As I see it, they still believed in freedom of will as well 
as in quantum uncertainty — and these subjects had one thing in common: They 
both confounded those scientists’ conceptions of causality. To be sure, the “uncer- 
tainty principle” must affect (to whatever small extent) the tiny synapses in the 
brain; therefore those structures must act, to that extent, unpredictably. But mere 
probabilistic uncertainty offers no genuine freedom. It but merely adds some capri- 
ciouness to a system based on lawful rules. Two generations of philosophers — and 
retired, burned-out physicists — have celebrated uncertainty in romantic, nonsen- 


sical terms, not only regarding physics itself, but also the mind and its “freedom of 
will.” 


What connects the mind to the world? This problem has always caused conflicts 
between physics, psychology, and religion. In the world of Newton’s mechanical 
laws, every event was entirely caused by what had happened earlier. There was 
simply no room for anything else. Yet commonsense psychology said that events 
in the world were affected by minds: People could decide what occurred by using 
their freedom of will. Most religions concurred in this, although some preferred 
to believe in schemes involving divine predestination. Most theories in psychology 
were designed to support deterministic schemes, but those theories were usually too 
weak to explain enough of what happens in brains. In any case, neither physical 
nor psychological determinism left a place for the freedom of will. 


It is only because of quantum laws that what we call “things” can exist at all. 
It is why there can be bodies with separate cells, so that there can be animals with 
synapses, nerves, and memories... It is why we can have genes to specify brains 
in which memories can be maintained — so that we can have our illusions of will. 
Richard Feynman said, “It is therefore not fair to say that from the apparent free- 
dom and indeterminacy of the human mind, we should have realized that classical 
‘deterministic’ physics could not ever hope to understand it, and to welcome quan- 
tum mechanics as a release from a completely mechanistic’ universe. For already in 
classical mechanics there was indeterminability from a practical point of view” [3]. 


We have heard two generations of philosophers and retired, burned-out physicists 
speaking about uncertainty in romantic, nonsensical terms — not only about basic 
physics itself, but about the mind and its “freedom of will.” Next time you hear 
this, ask them if they realize that only Quantum Certainty makes anything we know 
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persist. 


10.6 Cellular Vacuum Revisited 


This section was originally published in “Cellular Vacuum” in 1982 [4]. Here I’ve 
revised most of the original, mainly by deleting some unsound speculations. 


This fantasy about conservation in cellular arrays was inspired by the first con- 
ference on computation and physics. The “cellular array” idea had already emerged 
in such forms as Ising models, renormalization theories, the “game of life,” and Von 
Neumann’s work on self-reproducing machines. In the 1980’s the subject became 
more popular with the work of Tommaso Toffoli, Norman Margolus, Edward Fred- 
kin and several others. Richard Feynman was interested too, and encouraged my 
writing this essay (without approving its contents)... 


Imagine a crystalline space-time world of separate “cells” in which each volume 
of space contains only a finite (and bounded) amount of information. Assume also 
that the state of each cell is determined by the previous states of itself and its 
neighbors. (Time also comes in discrete moments, which synchronize the entire 
machine.) Over some range of size and speed, could the mechanics of such a world 
be approximately classical? To answer this, we’ll construct analogs of particles and 
fields, and ask what it would mean for these to satisfy constraints like conservation 
of momentum. In each case classical mechanics will break down on scales both small 
and large — and strange phenomena will emerge: A maximal velocity, a slowing 
of internal clocks, limits of simultaneous measurements, and quantumlike effects in 
very weak or intense fields. 


10.6.1 Cellular Arrays 


Envision space as a cubic array of finite-state “cells.” At any moment, each cell 
is in one of a few possible “states” — and the rules for how states change from 
one moment to the next are the “vacuum field equations” of this universe. These 
rules are starkly local, each cell’s state is determined only by its own and neighbors’ 
states of the preceding moment. A one-dimensional example illustrates a simple 
moving “packet”: there are just four states: 1, P, “0” and Q. Initially all cells are 
“0” except that somewhere appears this pattern: 


—- - -0001111P000000000 - -- 


A typical state-change rule has the form of 1 : Q : P — 1, which means when a 
cell in state Q sees a 1 to its left and a P to its right, it switches to state 1. Now 
consider this set of state-change rules: 

bP SP 0,1,P 0 0,.P,P-Q 

Q,0,0—- P Q,P,X ~Q X,Q,X > 1 
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where X means the transition does not depend on that neighbor’s state. Unless 
otherwise specified, each cell remains in its previous state. This initial configuration 
reproduces itself one unit over to the right, repeating this forever: 


t= 0000111 i1P0 00 0 0 
1000311 1P P00 0 0 0 
200041%1P PPO00 0 0 0 
30001P PPP 000 0 0 
40000PPPP 000 0 0 
5 0000Q PPP O00 0 0 
600001QPP 000 0 0 
700001 1Q P00 00 0 0 
8 00001 1 1 Q 000 0 0 
900001 1 1 1 P00 0 0 


We can write state interaction rules to do almost anything one can imagine. There 
even exist “universal” sets of state-change rules with which a single cellular array can 
“simulate” any computation. The trick is to encode, into the universal array’s initia] 
conditions, another set of state-change rules. The universal rules then “interpret” 
those other rules. Roger Banks has described a remarkably simple universal scheme 
in which each cell has only two states, depending only on four neighbors [5]. 


Size and Precision. In the example above the size of a packet is inverse to its 
speed. More generally, there must be an absolute constraint between the amount 
of information in any packet and the volume of that packet! Just as in Heisenberg’s 
principle, it is not so much a parameter’s value that determines packet size, as its 
precision — the number of “bits of information” needed to specify its properties. 


If the information carried in a packet were “optimally encoded” then the packet’s 
size would depend on the base-2 logarithm of its precision. Then why is there no 
logarithm in Heisenberg’s principle? This suggests the conjecture that most physical 
information (particularly in photons) is encoded not in base 2, but in the less dense 
base-1 form. Later we'll argue that particles with rest mass may employ denser 
codes. This argument relates position not only to velocity but also to any other 
property, so this does not lead directly to the particular commutators of quantum 
theory. 


Uniform Motion. Any bounded packet that moves within a regular lattice 
must have an eventually repeating trajectory. Such trajectories will appear perfectly 
straight, on any large enough scale — so we can deduce Newton’s law of inertia (for 
compact particles) directly from the regularity of the lattice. 


Maximal Speed. Since no effect can propagate faster than the basic lattice 
speed of one cell per “moment,” there is a largest possible speed. Let’s identify 
this with the speed of light. It is easy to design small light-speed packets, but 
more machinery is needed for sub-light-speed propagation. There are fundamental 
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differences between light-speed things (call them “photons” ) and slower ones. Infor- 
mation behind a photon’s wave front can never catch up — hence the information 
mechanics of photons must be relatively simple; they cannot do “three-dimensiona]” 
computations. However, if data can propagate diagonally, then some computation 
can proceed along the wavefront of a light-speed packet, and thus have a cyclic 
phase, or other more complex properties. 


Time Contraction. The faster a non-photon moves, the slower must proceed 
its internal computations. Imagine that some computation inside a packet has to 
send information back and forth through some number L of cells; then when the 
packet is ‘at rest’ the round-trip will take time 2L. Now, make the packet move 
in L’s direction at (N — 1)/N the speed of light. The retrograde time remains of 
order L, but now it takes at least (N — 1)L moments for data to advance L spaces 
(relative to the packet) so the round-trip time has then the order of NL. Therefore 
the speed of internal computations, relative to the fixed frame, must slow down 
by N/2. (This is not what Lorentz invariance requires. I complained to Feynman 
about not seeing how to make those ideas relativistic. He said not to worry about 
that because any good physicist should be able to fix that. I wish I’d asked for a 
few more hints.) 


Divergence and Aperture. In real optics it takes twice the aperture to halve 
a beam’s divergence. However, an “optimal” encoding of the angle should only need 
a single extra “bit.” This would suggest Nature uses “base-1” codes for photons 
— perhaps because only base-1 codes could let a discrete mechanism “add” quickly 
enough to make things linear at light speed. 


Frequency and Time. To maintain a beam’s divergence while shrinking both 
aperture and wavelength, the wavelength information must be spread out transver- 
sally. This suggests a constraint resembling the energy-time version of Heisenberg’s 
principle. 


10.6.2 Spherical Symmetry 


No regular lattice is invariant under rotation, Euclidean or Lorentz, since it needs 
different information to move along different axes. So, just as waves in crystals 
show Bragg diffraction, “discrete vacuums” must show angular anisotropies at some 
extremes of size or energy. Such problems already lurk beneath the surface of all 
other modern theories — but here our main concern is seeing how a discrete model 
could have other ordinary properties on ordinary scales. 


Liquid Lattice Model. One could imagine cell connections so randomly irregu- 
lar that, in the large, the space is isotropic — like water, which is almost crystalline 
from each atom to the next, but isotropic on the larger scale. But to build our 
packets into such a world, we would have to find transition rules insensitive to local 
cell-connection fluctuations. 
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Continuous Creation. Instead of starting with a liquid vacuum, we could 
randomly insert new cells from time to time. This would red shift cosmically old 
photons (by lengthening their unary frequency counters) and uniformly expand the 
universe. As Richard Stallman has pointed out, this would require an amorphous 
(rather than a regular lattice) cellular system — making it harder to invent adequate 
sets of transition rules. 


Spherical Propagation. How can we approximate isotropic propagation in 
a regular lattice? In the original essay, I posed this as the problem of how to 
produce asymptotically spherical expanding wave fronts in regular lattices. Since 
then, Margolus, Wolfram, and others discovered some surprisingly simple solutions. 
Another approach would be to make each particle emit showers of randomly oriented 
“force pellets.” This would produce an isotropic inverse-square force field — but 
raises the problem of how to approximate a uniform spherical distribution of such 
pellets. Another approach might be to fill the universe with a gas of light-speed 
momentum pellets whose “shadows” cause inverse-square forces. This transfers the 
isotropy burden to the universe as a whole but, as Richard Feynman pointed out in 
1963, this eventually drags everything to rest within a distinguished inertial frame. 


Curvature. Suppose a spherical force field were known to have emerged from 
a “unit charge.” Now represent that field by marking space itself as a family of 
equipotential surfaces. These markings need no further local information at all, 
because the field intensity at any point can be determined just from local curvature. 
However, for such a field to act on any particle, when that curvature is very small, 
the particle must probe correspondingly large distances. How could such a particle 
respond as though the interaction works at light speed? We discuss this in the next 
section. 


10.6.3 Fields 


The idea of a field abandons long range forces, and only asks the vacuum to constrain 
some local quantity — for example, by a differential equation. Classical ‘continuous’ 
theories assume that the vacuum can use computational schemes methods that are 
infinitely rapid and precise. This conceals many questions about the nature of 
distance and the character of partial derivatives — how can the vacuum measure 
and compute these? We have all become so comfortable with “real” numbers that 
we have come to think they are really real. Then we grumble when our theories 
give us series that make us pick and choose which terms to keep or throw away! 
The cellular model avoids all derivatives and real numbers. Here I'll try to show 
how we could make the state-change laws control a family of surfaces to act like 
a potential field on a charged particle. The resulting field has many peculiarities. 
Some of those may be just plain wrong — but perhaps some others could be related 
to other physical phenomena. 


To represent the potential field, we will simply “mark” those vacuum cells that 
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are close to equipotential surfaces. In a classical field, the force on a particle will 
depend on the gradient, which requires computing the surface normal — but in 
the cubic cellular model, there are no curved surfaces. Instead, imagine that the 
surfaces are entirely composed of axis-parallel polygons. Then we'll approximate 
the effect of the gradient by interacting simply with those polygons. Whenever a 
particle crosses a surface it will add a signed unit vector normal to that surface — 
so that as the particle moves along its trajectory, those polygons will separately 
supply vector components. 


Now, how could the make the field maintain the right shapes for those surfaces? 
Some sort of physical activity must represent the information in the field’s Laplacian 
derivatives. What we'll do is to populate the vacuum with a gas of light-speed 
“exchange” photons — call them “ghotons.” We assign one ghoton to each oriented 
surface element. Each ghoton bounces repeatedly between one surface and the 
next one, along their common normal axis. The result is that the frequency of a 
ghoton’s reflections is in inverse proportion to the (axis-parallel) distances between 
those surfaces. Each collision moves the surface element one unit away — and this 
has the effect of a pressure that tend to push successive potential surfaces apart. 
Now the corresponding field-gradient component along the x-axis is (approximately) 
inverse to xz-distance between consecutive surfaces, so each ghoton collides once in 
every 2/EHxz moments; hence the impact “pressure” is proportional to the vector 
&. This way, we can approximate a field in spite of local finiteness, by using 
something like “exchange forces.” Thus, these pressures will be in equilibrium 
when the intersurface spacings are approximately those of the corresponding field 
equipotentials. (So far as I know, no one has simulated such a model to see how 
well it could approximate solutions to a wave equation.) 


Are such ideas worth further pursuit? I think they could lead to good new 
ideas. Classically, we tend to view exchange-force models as mere approximations. 
But it is at least conceivable that some such finite-based model might turn out to 
actually yield exact results. Then we might be led to conclude that all those infinite 
series and analytic integrals were merely artifacts that came from using continuous 
approximations for things that by Nature should have been discrete! 


10.6.4 Interactions with non-local “Particles” 


Consider a collision between two bodies A and B whose momenta are very precisely 
specified — hence their packets must be large in size. What happens when A and 
B exchange momentum — if some of that information lies far away? If they are 
to approximate a classical interaction, this will have to happen in a process that 
locally first “estimates” the dispersed particles’ momenta and then later repairs 
those estimates. In order that they interact at all, the particles must work with less 
than all the information classically required. It would seem that any such scheme 
that approximates prompt, conservative interaction must lead to some quantumlike 
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phenomena. I have a vague idea of how this might work, but it’s so complicated 
that I doubt it could lead to a sound theory. 


Step 1. An “event” occurs at some space-time locus in which the incoming 
momenta are “estimated,” and the outgoing momenta are determined by apply- 
ing classical rules to these estimates. Because of estimation errors, the scattered 
momentum sums will not exactly equal the initial sums, so we need an “error cor- 
rection” mechanism: 


Step 2. Each scattered particle deposits a “receipt” with the other particle, 
recording how much momentum was actually removed. These receipts are combined 
with those from previous interactions, and are later “discharged” in the course of 
subsequent interactions. (An alternative would be to discharge the receipt momen- 
tum along the trajectory, but that would lead to curved trajectories. ) 


Such systems would show some quantumlike, mixed-state properties. When we 
measure a particle’s momentum in event 1, we cannot yet observe its “receipt” mo- 
mentum, because that cannot have any effect until some subsequent event 2 at which 
time it will be already mixed with another estimate. And so on. One can never 
simultaneously measure both estimates and receipts, though it all adds up eventu- 
ally. This involves no probabilities, but results from the temporary inaccessibility 


of information. 


Estimates and receipts would also permit tunneling-interactions that temporar- 
ily require more momentum than available. All is repaid eventually when receipts 
return their information in new estimates, but at every moment every particle car- 
ries invisible receipts not yet observable. Such models would show some qualitative 
features of quantum interference; if ever two particles were involved in the same 
interaction, they may still share receipt information that gives them some coherent 
“same random” properties in later interactions. 


10.6.5 Rest Mass as a product of “vacuum saturation.” 


Why dowe have particles with rest mass? I will argue that they are needed whenever 
a field becomes so intense that neighboring planes have not enough space. Then, 
some information would have to be destroyed if we continue to use the base-1 
representations. However, there is a possible loophole: we can provide that at 
some certain threshold of intensity, the vacuum state rules impose a more compact 
coding — for example, by compression to base-2. That must sound silly, but it has 
some interesting consequences. First, this “abbreviation” process must be almost 
instantaneous, or information will be lost. But the light-speed limitation means that 
abbreviations must be done very locally, by compression into standard units, and 
that could lead to quantizations of momenta, orientations, spins, and energy levels. 
However, each such “abbreviation” must still carry enough information that, when 
the packet interacts again (or decays back into the field) the conserved quantities 
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can be reproduced. 


If an abbreviation carries a compressed momentum vector which can later inter- 
act in a computation, then that abbreviation must move at less than light speed. 
This suggests the conjecture that particles with rest mass are compressed, densely 
encoded representations of fragments of unary-coded fields. Presumably, then, a 
particle’s rest mass represents the potential energy of the field consumed to create 
it. So, in this fantasy, we must suffer the creation of particles purely to conserve 
the energy and momentum of strong fields. 


What else can we say about massive particles? There are many possible en- 
codings between unary codes — that can propagate at the speed of light — and 
the binary codes which are most possibly compact? The closer the compression 
approaches that ultimate density, the fewer ways will remain for different particles 
to share the same space — so we must see either stronger exclusion rules, or inter- 
actions in which particles change. So we can conclude, at least qualitatively, the 
following: 


1. Particles with rest mass have strong, short-range forces. Ultimately some 
information must be lost, at some threshold of intensity. If conservation of 
energy has the top priority, then geometric information has to be sacrificed 
first. That is, unless the basic state rules themselves are reversible. Fredkin 
has shown local time reversibility to be compatible with many cellular array 
computations, so it would seem of physical interest to consider time-reversible 
vacuum-state rules. 


2. When part of a field is compressed into a particle, this must relieve the com- 
pression of the field’s other surfaces, in an effect that propagates at luminal 
velocity. Hence the creation of particles must be accompanied by radiation. 


3. A massive particle moves slower than its field and takes time to “decay.” 
This suggests that particle creation cannot conserve all of a field’s topology. 
When later that particle returns its information to its field, this will happen 
at some other place — and the global configuration of the field there will 
change. Conjecture: Properties like charge represent relics of the original 
field’s topology. 


Can we pursue this down to the very lattice elements? Previously we argued 
that three unit base vectors would suffice to abbreviate a surface normal of a field. 
But when things become too much compressed, there might not be room enough 
for all those ghotons and surfaces. If a surface were disrupted, or two surfaces were 
merged, this would lead to long range effects. There are eight different types of 
trihedral vertices and twelve kinds of edges, and various of these might combine 
and/or annihilate one another. The state-transition rules would need to specify 
what should happen in such cases. For example, we might program the vacuum 
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to attach “axis-objects” to disrupted surface-edges. Presumably these could not be 
“observed” until enough of them combine to constitute or modify some other more 
“genuine” particle. In the meantime there would also be “radiation” as the rest of 
the field finds a new equilibrium configuration. It is easy to imagine various ways in 
which this could give rise to various six- and eight-fold “ways” that the field could 
use to restore itself. 


10.6.6 Conclusions 


In a cellular array, no field can work at light speed except by using “base-1” in- 
formation codes. To approximate a Coulomb field may require something like an 
exchange force. Objects with “rest mass” must emerge from suitably intense fields 
— to conserve information before it is squeezed to death — and these must move 
with subluminal speed. These illustrate how, starting with simple finitistic ideas, 
we can end up in a world cluttered with sluggish, complicated objects with queer 
interactions and exclusion rules, and peculiar short-range forces. 


Conservation also caused “uncertainty” to invade our simple world, because 
local finiteness requires that information to be dispersed. We also found some need 
to “balance the books” by using a complex system of “events,” “receipts,” and 
“estimates.” Surely we can find simpler schemes to approximate classical physics. 
The ones proposed here are much too weird — and probably wouldn’t work anyway. 
In spite of all these problems, the informational and computational clarity of such 
models could stimulate new insights. It remains to be seen whether this sort of 
approach can lead to good physical theories. 


In any case, I'll argue next, beings like us could not exist in a classical sort of 
universe. Only worlds with firmer constraints can evolve beings like us that make 
theories like this. For example, unless some conservation laws hold, too many things 
would tend to explode. Quantum-like states seem essential for life. In the popular 
view, quantum states lead to uncertainty. However, on the contrary, it is they that 
support our stability. 


This essay exploits many ideas I got from Edward Fredkin [6]. The ideas about 
field and particle are original; Richard Feynman persuaded me to consider fields 
instead of forces, but he’s not responsible for my compromise on potential surfaces. 
I also thank Danny Hillis and Richard Stallman for other ideas. 
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Part III 


Quantum Limits 


1] 


SIMULATING PHYSICS WITH COMPUTERS 


Richard P. Feynman * 


11.1 Introduction 


On the program it says this is a keynote speech — and I don’t know what a keynote 
speech is. I do not intend in any way to suggest what should be in this meeting 
as a keynote of the subjects or anything like that. I have my own things to say 
and to talk about and there’s no implication that anybody needs to talk about 
the same thing or anything like it. So what I want to talk about is what Mike 
Dertouzos suggested that nobody would talk about. I want to talk about the 
problem of simulating physics with computers and I mean that in a specific way 
which I am going to explain. The reason for doing this is something that I learned 
about from Ed Fredkin, and my entire interest in the subject has been inspired by 
him. It has to do with learning something about the possibilities of computers, and 
also something about possibilities in physics. If we suppose that we know all the 
physical laws perfectly, of course we don’t have to pay any attention to computers. 
It’s interesting anyway to entertain oneself with the idea that we've got something 
to learn about physical laws; and if I take a relaxed view here (after all I’m here 
and not at home) I’ll admit that we don’t understand everything. 


The first question is, “What kind of computer are we going to use to simulate 
physics?” Computer theory has been developed to a point where it realizes that 
it doesn’t make any difference; when you get to a universal computer, it doesn’t 
matter how it’s manufactured, how it’s actually made. Therefore my question is, 
“Can physics be simulated by a universal computer?” I would like to have the 
elements of this computer locally interconnected, and therefore sort of think about 
cellular automata as an example (but I don’t want to force it). But I do want 
something involved with the locality of interaction. I would not like to think of 
a very enormous computer with arbitrary interconnections throughout the entire 
thing. 


Now, what kind of physics are we going to imitate? First, I am going to describe 
the possibility of simulating physics in the classical approximation, a thing which is 
usually described by local differential equations. But the physical world is quantum 
mechanical, and therefore the proper problem is the simulation of quantum physics 
— which is what I really want to talk about, but I’ll come to that later. So what 
kind of simulation do I mean? There is, of course, a kind of approximate simulation 
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in which you design numerical algorithms for differential equations, and then use 
the computer to compute these algorithms and get an approximate view of what 
physics ought to do. That’s an interesting subject, but is not what I want to talk 
about. I want to talk about the possibility that there is to be an ezact simulation, 
that the computer will do ezactly the same as nature. If this is to be proved and 
the type of computer is as I’ve already explained, then it’s going to be necessary 
that everything that happens in a finite volume of space and time would have to be 
exactly analyzable with a finite number of logical operations. The present theory 
of physics is not that way, apparently. It allows space to go down into infinitesimal 
distances, wavelengths to get infinitely great, terms to be summed in infinite order, 
and so forth; and therefore, if this proposition is right, physical law is wrong. 


So good, we already have a suggestion of how we might modify physical law, 
and that is the kind of reason why I like to study this sort of problem. To take 
an example, we might change the idea that space is continuous to the idea that 
space perhaps is a simple lattice and everything is discrete (so that we can put it 
into a finite number of digits) and that time jumps discontinuously. Now let’s see 
what kind of a physical world it would be or what kind of problem of computation 
we would have. For example, the first difficulty that would come out is that the 
speed of light would depend slightly on the direction, and there might be other 
anisotropies in the physics that we could detect experimentally. They might be 
very small] anisotropies. Physical knowledge is of course always incomplete, and 
you can always say well try to design something which beats experiment at the 
present time, but which predicts anistropies on some scale to be found later. That’s 
fine. That would be good physics if you could predict something consistent with all 
the known facts and suggest some new fact that we didn’t explain, but I have no 
specific examples. So I’m not objecting to the fact that it’s anistropic in principle, 
it’s a question of how anistropic. If you tell me it’s so-and-so anistropic, I’ll tell you 
about the experiment with the lithium atom which shows that the anistropy is less 
than that much, and that this here theory of yours is impossible. 


Another thing that had been suggested early was that natural laws are reversible, 
but that computer rules are not. But this turned out to be false; the computer rules 
can be reversible, and it has been a very, very useful thing to notice and to discover 
that. This is a place where the relationship of physics and computation has turned 
itself the other way and told us something about the possibilities of computation. 
So this is an interesting subject because it tells us something about computer rules, 
and might tell us something about physics. 


The rule of simulation that I would like to have is that the number of computer 
elements required to simulate a large physical system is only to be proportional to 
the space-time volume of the physical system. I don’t want to have an explosion. 
That is, if you say I want to explain this much physics, I can do it exactly and I 
need a certain-sized computer. If doubling the volume of space and time means I’11 
need an ezponentially larger computer, I consider that against the rules (I make up 
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Fig. 11.1. 


the rules, I’m allowed to do that). Let’s start with a few interesting questions. 


11.2 Simulating Time 


First I’d like to talk about simulating time. We’re going to assume it’s discrete. 
You know that we don’t have infinite accuracy in physical measurements so time 
might be discrete on a scale of less than 107?” sec. (You'd have to have it at least 
like this to avoid clashes with experiment — but make it 10~*! sec. if you like, and 
then you’ve got us!) One way in which we simulate time — in cellular automata, 
for example — is to say that “the computer goes from state to state.” But really, 
that’s using intuition that involves the idea of time — your're going from state to 
state. And therefore the time (by the way, like the space in the case of cellular 
automata) is not simulated at all, it’s imitated in the computer. 


An interesting question comes up: “Is there a way of simulating it, rather than 
imitating it?” Well, there’s a way of looking at the world that is called the space- 
time view, imagining that the points of space and time are all laid out, so to speak, 
ahead of time. And then we could say that a “computer” rule (now computer would 
be in quotes, because it’s not the standard kind of computer which operates in time) 
is: We have a state s; at each point 7 in space-time. (See Figure 11.1) The state s; 
at the space time point 7 is a given function F;j(s;, sz, ...) of the state at the points 
7, k in some neighborhood of 2: 


si = Fi; ($3, 8k, +) 


You'll notice immediately that if this particular function is such that the value 
of the function at 7 only involves the few points behind in time, earlier than this 
time 7, all I’ve done is to redescribe the cellular automaton, because it means that 
you calculate a given point from points at earlier times, and I can compute the next 
one and so on, and I can go through this in that particular order. But just let’s 
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think of a more general kind of computer, because we might have a more general 
function. So let’s think about whether we could have a wider case of generality of 
interconnections of points in space-time. If F depends on all the points both in 
the future and the past, what then? That could be the way physics works. I'll 
mention how our theories go at the moment. It has turned out in many physical 
theories that the mathematical equations are quite a bit simplified by imagining 
such a thing — by imagining positrons as electrons going backwards in time, and 
other things that connect objects forward and backward. The important question 
would be, if this computer were laid out, is there in fact an organized algorithm 
by which a solution could be laid out, that is, computed? Suppose you know this 
function F; and it is a function of the variables in the future as well. How would 
you lay out numbers so that they automatically satisfy the above equation? It may 
not be possible. In the case of the cellular automaton it is, because from a given 
row you get the next row and then the next row, and there’s an organized way of 
doing it. It’s an interesting question whether there are circumstances where you 
get functions for which you can’t think, at least right away, of an organized way of 
laying it out. Maybe sort of shake it down from some approximation, or something, 
but it’s an interesting different type of computation. 


Question: “Doesn’t this reduce to the ordinary boundary value, as opposed to 
initial-value type of calculation?” 


Answer: “Yes, but remember this is the computer itself that I’m describing.” 


It appears actually that classical physics is causal. You can, in terms of the 
information in the past, if you include both momentum and position, or the position 
at two different times in the past (either way, you need two pieces of information at 
each point) calculate the future in principle. So classical physics is local, causal, and 
reversible, and therefore apparently quite adaptable (except for the discreteness and 
so on, which I already mentioned) to computer simulation. We have no difficulty, 
in principle, apparently, with that. 


11.3. Simulating Probability 


Turning to quantum mechanics, we know immediately that here we get only the 
ability, apparently, to predict probabilities. Might I say immediately, so that you 
know where I really intend to go, that we always have had (secret, secret, close the 
doors!) we always have had a great deal of difficulty in understanding the world 
view that quantum mechanics represents. At least I do, because I’m an old enough 
man that I haven’t got to the point that this stuff is obvious to me. Okay, I still 
get nervous with it. And therefore, some of the younger students ... you know how 
it always is, every new idea, it takes a generation or two until it becomes obvious 
that there’s no real problem. It has not yet become obvious to me that there’s 
no real problem. I cannot define the real problem, therefore I suspect there’s no 
real problem, but I’m not sure there’s no real problem. So that’s why I like to 
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investigate things. Can J learn anything from asking this question about computers 
— about this may or may not be mystery as to what the world view of quantum 
mechanics is? So I know that quantum mechanics seems to involve probability — 
and I therefore want to talk about simulating probability. 


Well, one way that we could have a computer that simulates a probabilistic 
theory, something that has a probability in it, would be to calculate the probability 
and then interpret this number to represent Nature. For example, let’s suppose 
that a particle has a probability P(z,t) to be at x at a time ¢. A typical example 
of such a probability might satisfy a differential equation, as, for example, if the 
particle is diffusing: 


OP (za, t) _ 


_2 
7 V* P(a,t) 


Now we could discretize ¢ and x and perhaps even the probability itself and 
solve this differential equation like we solve any old field equation, and make an 
algorithm for it, making it exact by discretization. First there’d be a problem 
about discretizing probability. If you are only going to take k digits it would mean 
that when the probability is less that 2—* of something happening, you say it doesn’t 
happen at all. In practice we do that. If the probability of something is 10-, 
we say it isn’t going to happen, and we’re not caught out very often. So we could 
allow ourselves to do that. But the real difficulty is this: If we had many particles, 
we have RF particles, for example, in a system, then we would have to describe the 
probability of a circumstance by giving the probability to find these particles at 
points 2},2%2,...%R at the time ¢. That would be a description of the probability 
of the system. And therefore, you’d need a k-digit number for every configuration 
of the system, for every arrangement of the FR values of x. And therefore if there 
are N points in space, we’d need N® configurations. Actually, from our point of 
view that at each point in space there is information like electric fields and so on, 
R will be of the same order as N if the number of information bits is the same as 
the number of points in space, and therefore you'd have to have something like N% 
configurations to be described to get the probability out, and that’s too big for our 
computer to hold if the size of the computer is of order N. 


We emphasize, if a description of an isolated part of Nature with N variables 
requires a general function of N variables and if a computer simulates this by 
actually computing or storing this function then doubling the size of nature (VN > 
2N) would require an exponentially explosive growth in the size of the simulating 
computer. It is therefore impossible, according to the rules stated, to simulate by 
calculating the probability. 


Is there any other way? What kind of simulation can we have? We can’t expect 
to compute the probability of configurations for a probabilistic theory. But the 
other way to simulate a probabilistic Nature, which Ill call A’ for the moment, 
might still be to simulate the probabilistic Nature by a computer C which itself is 
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probabilistic, in which you always randomize the last two digits of every number, or 
you do something terrible to it. So it becomes what Il call a probabilistic computer, 
in which the output is not a unique function of the input. And then you try to work 
it out so that it simulates Nature in this sense: that C goes from some state — initial 
state if you like — to some final state with the same probability that NV goes from 
the corresponding initial state to the corresponding final state. Of course when you 
set up the machine and let Nature do it, the imitator will not do the same thing, it 
only does it with the same probability. Is that no good? No it’s okay. How do you 
know what the probability is? You see, Nature’s unpredictable; how do you expect 
to predict it with a computer? You can’t — it’s unpredictable if it’s probabilistic. 
But what you really do in a probabilistic system is repeat the experiment in Nature 
a large number of times. If you repeat the same experiment in the computer a 
large number of times (and that doesn’t take any more time than it does to do the 
same thing in Nature of course), it will give the frequency of a given final state 
proportional to the number of times, with approximately the same rate (plus or 
minus the square root of n and all that) as it happens in Nature. In other words, 
we could imagine and be perfectly happy, I think, with a probabilistic simulator of 
a probabilistic Nature, in which the machine doesn’t exactly do what Nature does, 
but if you repeated a particular type of experiment a sufficient number of times to 
determine Nature’s probability, then you did the corresponding experiment on the 
computer, you’d get the corresponding probability with the corresponding accuracy 
(with the same kind of accuracy of statistics). 


So let us now think about the characteristics of a local probabilistic computer, 
because I'll see if I can imitate Nature with that (by “Nature” I’m now going to 
mean quantum mechanics). One of the characteristics is that you can determine 
how it behaves in a local region by simply disregarding what it’s doing in all other 
regions. For example, suppose there are variables in the system that describe the 
whole world (x4,2B) — the variables x4 you're interested in, they’re “around 
here”; xg are the whole result of the world. If you want to know the probability 
that something around here is happening, you would have to get that by integrating 
the total probability of all kinds of possibilities over rg. If we had computed this 
probability, we would still have to do the integration 


Pa(ra) = | Paza)den 


which is a hard job! But if we have imitated the probability, it’s very simple to do 
it: you don’t have to do anything to do the integration, you simply disregard what 
the values of zg are, you just look at the region x4. And therefore it does have the 
characteristic of Nature: if it’s local, you can find out what’s happening in a region 
not by integrating or doing an extra operation, but merely by disregarding what 
happens elsewhere, which is no operation, nothing at all. 


The other aspect that I want to emphasize is that the equations will have a 
form, no doubt, something like the following. Let each point 7 = 1,2,...,N in 
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space be in a state s; chosen from a small state set (the size of this set should be 
reasonable, say, up to 2°). And let the probability to find some configuration {s;} 
(a set of values of the state s, at each point 7) be some number P({s;}). It satisfies 
an equation such that at each jump in time 


Pii({s}) = 2 Ths s,---)| Pa({s‘}) 
{s’} Lt 
where m(s;|s 9? 5. ---) is the probability that we move to state s; at point 2 when the 
neighbors have values s,, 5, ..., where 7, k etc. are points in the neighborhood of 7. 


As j moves far from 7, m becomes ever less sensitive to s;. At each change the state 
at a particular point 7 will move from what it was to a state s with a probability m 
that depends only upon the states of the neighborhood (which may be so defined 
as to include the point 2 itself). This gives the probability of making a transition. 
It’s the same as in a cellular automaton; only, instead of its being definite, it’s a 
probability. Tell me the environment, and I’ll tell you the probability after a next 
moment of time that this point is at state s. And that’s the way it’s going to work, 
okay? So you get a mathematical equation of this kind of form. 


Now I explicitly go to the question of how we can simulate with a computer — a 
universal automaton or something — the quantum mechanical effects. (The usual 
formulation is that quantum mechanics has some sort of a differential equation 
for a function w.) If you have a single particle, ~ is a function of z and ¢, and 
this differential equation could be simulated just like my probabilistic equation was 
before. That would be all right and one has seen people make little computers which 
simulate the Schrdedinger equation for a single particle. But the full description 
of quantum mechanics for a large system with RF particles is given by a function 
w(x1,22,...,2R,t) which we call the amplitude to find the particles at z1,...,@R, 
and therefore, because it has too many variables, it cannot be simulated with a 
normal computer with a number of elements proportional to R or proportional 
to N. We had the same troubles with the probability in classical physics. And 
therefore, the problem is, how can we simulate quantum mechanics? There are two 
ways that we can go about it. We can give up on our rule about what the computer 
was, we can say: Let the computer itself be built of quantum mechanical elements 
which obey quantum mechanical laws. Or we can turn the other way and say: Let 
the computer still be the same kind that we thought of before — a logical, universal 
automaton; can we imitate this situation? And I’m going to separate my talk here, 
for it branches into two parts. 


11.4 Quantum Computers — Universal Quantum Simulators 


The first branch, one you might cal] a side-remark, is, “Can you do it with a new 
kind of computer — a quantum computer?” (I'll come back to the other branch in 
a moment.) Now it turns out, as far as ! can tell, that you can simulate this with a 
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quantum system, with quantum computer elements. It’s not a Turing machine, but 
a machine of a different kind. If we disregard the continuity of space and make it 
discrete, and so on, as an approximation (the same way as we allowed ourselves in 
the classical case), it does seem to be true that all the various field theories have the 
same kind of behavior, and can be simulated in every way, apparently, with little 
latticeworks of spins and other things. It’s been noted time and time again that 
the phenomena of field theory (if the world is made in a discrete lattice) are well 
imitated by many phenomena in solid state theory (which is simply the analysis of 
a latticework of crystal atoms, and in the case of the kind of solid state I mean, each 
atom is just a point which has numbers associated with it, with quantum mechanical 
rules). For example, the spin waves in a spin lattice imitating Bose particles in the 
field theory. I therefore believe it’s true that with a suitable class of quantum 
machines you could imitate any quantum system, including the physical world. 
But I don’t know whether the general theory of this intersimulation of quantum 
systems has ever been worked out, and so | present that as another interesting 
problem: To work out the classes of different kinds of quantum mechanical systems 
which are really intersimulatable — which are equivalent — as has been done in 
the case of classical computers. It has been found that there is a kind of universal 
computer that can do anything, and it doesn’t make much difference specifically 
how it’s designed. The same way we should try to find out what kinds of quantum 
mechanical systems are mutually intersimulatable, and try to find a specific class, 
or a character of that class which will simulate everything. What, in other words, is 
the universal quantum simulator (assuming this discretization of space and time)? 
If you had discrete quantum systems, what other discrete quantum systems are 
exact imitators of it, and is there a class against which everything can be matched? 
I believe it’s rather simple to answer that question and to find the class, but I just 
haven’t done it. 


Suppose that we try the following guess: that every finite quantum mechani- 
cal system can be described ezactly, imitated exactly, by supposing that we have 
another system such that at each point in space-time this system has only two pos- 
sible base states. Either that point is occupied, or unoccupied — those are the two 
states. The mathematics of the quantum mechanical operators associated with that 
point would be very simple. 
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a=ANNIHILATE= 


1 CREATE = 
+— 1 
n= NUMBER= 1 0 = a*a= —(1+4a;) 
2 
lo 0 
i= IDENTITY = 1 0 
0 | 


There would be an operator a which annihilates if the point is occupied — it 
changes it to unoccupied. There is a conjugate operator a* which does the opposite: 
If it’s unoccupied, it occupies it. There’s another operator n called the number 
to ask, “Is something there?” The little matrices tell you what they do. If it’s 
there, n gets a one and leaves it alone; if it’s not there, nothing happens. That’s 
mathematically equivalent to the product of the other two, as a matter of fact. 
And then there’s the identity, 1, which we always have to put in there to complete 
our mathematics — it doesn’t do a damn thing! By the way, on the right-hand 
side of the above formulas the same operators are written in terms of matrices that 
most physicists find more convenient, because they are Hermitian, and that seems 
to make it easier for them. They have invented another set of matrices, the Pauli 
go matrices: 


_ (1 0 _(0 1 ee ee 
z= lq 1)? 7 =\y Oo)? AE OF? ~=lo 1 


And these are called spin — spin one-half — so sometimes people say you're talking 
about a spin-one-half lattice. 


The question is, “If we wrote a Hamiltonian which involved only these operators, 
locally coupled to corresponding operators on the other space-time points, could we 
imitate every quantum mechanical system which is discrete and has a finite number 
of degrees of freedom?” I know, almost certainly, that we could do that for any 
quantum mechanical system which involves Bose particles. I’m not sure whether 
Fermi particles could be described by such a system. So I leave that open. Well, 
that’s an example of what I meant by a general quantum mechanical simulator. 
I’m not sure that it’s sufficient, because I’m not sure that it takes care of Fermi 
particles. 
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11.5 Can Quantum Systems be Probabilistically Simulated 
by a Classical Computer? 


Now the next question that I would like to bring up is, of course, the interesting 
one, i.e., “Can a quantum system be probabilistically simulated by a classical (prob- 
abilistic, I'd assume) universal computer?” In other words, a computer which will 
give the same probabilities as the quantum system does. If you take the computer 
to be the classical kind I’ve described so far, (not the quantum kind described in the 
last section) and there’re no changes in any laws, and there’s no hocus-pocus, the 
answer is certainly, “No!” This is called the hidden-variable problem: It is impossi- 
ble to represent the results of quantum mechanics with a classical universal device. 
To learn a little bit about it, I say let us try to put the quantum equations in a form 
as Close as possible to classical equations so that we can see what the difficulty is 
and what happens. Well, first of all we can’t simulate ~ in the normal way. As I’ve 
explained already, there’re too many variables. Our only hope is that we’re going 
to simulate probabilities, that we’re going to have our computer do things with the 
same probability as we observe in nature, as calculated by the quantum mechanical 
system. Can you make a cellular automaton, or something, imitate with the same 
probability what Nature does, where I’m going to suppose that quantum mechanics 
is correct, or at least after I discretize space and time it’s correct, and see if I can do 
it. I must point out that you must directly generate the probabilities, the results, 
with the correct quantum probability. Directly, because we have no way to store 
all the numbers, we have to just imitate the phenomenon directly. 


It turns out then that another thing, rather than the wave function, a thing 
called the density matrix, is much more useful for this. It’s not so useful as far 
as the mathematical equations are concerned, since it’s more complicated than the 
equations for ~, but I'm not going to worry about mathematical complications, or 
which is the easiest way to calculate, because with computers we don’t have to be so 
careful to do it the very easiest way. And so with a slight increase in the complexity 
of the equations (and not very much increase) I turn to the density matrix, which 
for a single particle of coordinate x in a pure state of wave function ~(z) is 


p(z,z') = o"(x)y(z') 


This has a special property that is a function of two coordinates x, x’. The presence 
of two quantities x and z’ associated with each coordinate is analogous to the fact 
that in classical mechanics you have to havetwo variables to describe the state, x and 
z. States are described by a second-order device, with two informations ( “position” 
and “velocity”). So we have to have two pieces of information associated with a 
particle, analogous to the classical situation, in order to describe configurations. 
(I’ve written the density matrix for one particle, but of course there’s the analogous 
thing for R particles, a function of 2R variables). 


This quantity has many of the mathematical properties of a probability. For 
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example if a state w(x) is not certain but is ~ with the probability pg, then the 
density matrix is the appropriate weighted sum of the matrix for each state a: 


p(x, 2’) = D> pats (z)ba(2') 


A quantity which has properties even more similar to classical probabilities is the 
Wigner function, a simple reexpression of the density matrix; for a single particle 


W (z, p) = [ oe + th — erudy 


We shall be emphasizing their similarity and shall call it “probability” in quotes 
instead of Wigner function. Watch these quotes carefully, when they are absent we 
mean the real probability. If “probability” had all the mathematical properties of a 
probability we could remove the quotes and simulate it. W(z, p) is the “probability” 
that the particle has position z and momentum p (per dz and dp). What properties 
does it have that are analogous to an ordinary probability? 


It has the property that if there are many variables and you want to know 
the “probabilities” associated with a finite region, you simply disregard the other 
variables (by integration). Furthermore the probability of finding a particle at zx is 
{ W(a, p)dp. If you can interpret W as a probability of finding z and p, this would 
be an expected equation. Likewise the probability of p would be expected to be 
{ W(z,p)dz. These two equations are correct, and therefore you would hope that 
maybe W(z,p) is the probability of finding xz and p. And the question then is can 
we make a device which simulates this W? Because then it would work fine. 


Since the quantum systems I noted were best represented by spin one-half (oc- 
cupied versus unoccupied or spin one-half is the same thing), I tried to do the same 
thing for spin one-half objects, and it’s rather easy to do. Although before, one ob- 
ject only had two states, occupied and unoccupied, the full description — in order 
to develop things as a function of time — requires twice as many variables, which 
mean two slots at each point which are occupied or unoccupied (denoted by + and 
— in what follows), analogous to the z and @, or the x and p. So you can find four 
numbers, four “probabilities” {f+4, f+-, f-+, f-—} which act just like, and I have 
to explain why they’re not exactly like, but they act just like probabilities to find 
things in the state in which both symbols are up, one’s up and one’s down, and so 
on. For example, the sum fi4 + fi + f_4 + f_~ of the four “probabilities” is 1. 
You'll remember that one object now is going to have two indices, two plus/minus 
indices, or two ones and zeros at each point, although the quantum system had only 
one. For example, if you would like to know whether the first index is positive, the 
probability of that would be 


Prob(first index is +) = f4+ + f4- [spin z up] 
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i.e., you don’t care about the second index. The probability that the first index is 
negative is 


Prob(first index is —) = f-4 + f-- [spin z down] 


These two formulas are exactly correct in quantum mechanics. You see I’m hedging 
on whether or not “probability” f can really be a probability without quotes. But 
when I write probability without quotes on the left-hand side I’m not hedging; that 
really is the quantum mechanical probability. It’s interpreted perfectly fine here. 
Likewise the probability that the second index is positive can be obtained by finding 


Prob(second index is +) = f44 + f-4 [spin x up] 
and likewise 
Prob(second index is —) = fy + f-- [spin z down] 


You could also ask other questions about the system. You might like to know, 
“What is the probability that both indices are positive?” You'll get in trouble. 
But you could ask other questions that you won’t get in trouble with, and that get 
correct physical answers. You can ask, for example, “What is the probability that 
the two indices are the same?” That would be 


Prob(match) = f44+ + j== [spin y up] 
Or the probability that there’s no match between the indices, that they’re different, 
Prob(no match) = f4- + f-+ [spin y down] 


All perfectly all right. All these probabilities are correct and make sense, and have 
a precise meaning in the spin model, shown in the square brackets above. There 
are other “probability” combinations, other linear combinations of these f’s which 
also make physically sensible probabilities, but I won’t go into those now. There 
are other linear combinations that you can ask questions about, but you don’t seem 
to be able to ask questions about an individual f. 


11.6 Negative Probabilities 


Now, for many interacting spins on a lattice we can give a “probability” (the quotes 
remind us that there is still a question about whether it’s a probability) for corre- 
lated possibilities: 


Fs), So,... Sn) (3; € {++,+-,-+,--}) 
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Next, if I look for the quantum mechanical equation which tells me what the changes 
of F’ are with time, they are exactly of the form that I wrote above for the classical 
theory: 


Fi4i({s}) = > iW M (si\s5, 8, ---)| Fe({s'}) 


{s'} Li 


but now we have F instead of P. The M(s;|s;,,,...) would appear to be inter- 
preted as the “probability” per unit time, or per time jump, that the state at z turns 
into s; when the neighbors are in configuration s’. If you can invent a probability 
M like that, you write the equations for it according to normal logic, those are the 
correct equations, the real, correct, quantum mechanical equations for this F’, and 
therefore you’d say, “Okay, so I can imitate it with a probabilistic computer!” 


There’s only one thing wrong. These equations unfortunately cannot be so 
interpreted on the basis of the so-called “probability” , or this probabilistic computer 
can’t simulate them, because the F is not necessarily positive. Sometimes it’s 
negative! The M, the “probability” (so-called) of moving from one condition to 
another is itself not positive; if I had gone all the way back to the f for a single 
object, it again is not necessarily positive. An example of possibilities here are 


fee = 0.6 fy = -0.1 f_y = 0.3 f._ = 0.2 


The sum f,,+ f;-— is 0.5, that’s 50% chance of finding the first index positive. The 
probability of finding the first index negative is the sum f_4 + f_4 which is also 
50%. The probability of finding the second index positive is the sum f,4 + fi4 
which is nine tenths, the probability of finding it negative is f,_ + f_— which is 
one-tenth, perfectly alright, it’s either plus or minus. The probability that they 
match is eight-tenths, the probability that they mismatch is plus two-tenths; every 
physical probability comes out positive. But the original f’s are not positive, and 
therein lies the great difficulty. The only difference between a probabilistic classical 
world and the equations of the quantum world is that somehow or other it appears 
as if the probabilities would have to go negative, and that we do not know, as far as 
I know, how to simulate. Okay, that’s the fundamental problem. I don’t know the 
answer to it, but I wanted to explain that if I try my best to make the equations look 
as near as possible to what would be imitable by a classical probabilistic computer, 
I get into trouble. 


11.7 Polarization of Photons — Two-State Systems 


I would like to show you why such minus signs cannot be avoided, or at least 
that you have some sort of difficulty. You probably have all heard this example 
of the Einstein-Podolsky-Rosen paradox, but I will explain this little example of a 
physical experiment which can be done, and which has been done, which does give 
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Fig. 11.3. 


the answers quantum theory predicts, and the answers are really right, there’s no 
mistake, if you do the experiment, it actually comes out. And I’m going to use the 
example of polarizations of photons, which is an example of a two-state system. 
When a photon comes, you can say it’s either x polarized or y polarized. You can 
find that out by putting in a piece of calcite, and the photon goes through the 
calcite either out in one direction, or out in another — actually slightly separated, 
and then you put in some mirrors, that’s not important. You get two beams, two 
places out, where the photon can go (See Figure 11.2). 


If you put a polarized photon in, then it will go to one beam called the ordinary 
ray, or another, the extraordinary one. If you put detectors there you find that 
each photon that you put in, it either comes out in one or the other 100% of the 
time, and not half and half. You either find a photon in one or the other. The 
probability of finding it in the ordinary ray plus the probability of finding it in the 
extraordinary ray is always 1 — you have to have that rule. That works. And 
further, it’s never found at both detectors. (If you might have put two photons in, 
you could get that, but you cut the intensity down — it’s a technical thing, you 
don’t find them in both detectors.) 


Now the next experiment: Separation into 4 polarized beams (see Figure 11.3). 
You put two calcites in a row so that their axes have a relative angle ¢, I happen 
to have drawn the second calcite in two positions, but it doesn’t make a difference 
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Fig. 11.4. 


if you use the same piece or not, as you care. Take the ordinary ray from one 
and put it through another piece of calcite and look at its ordinary ray, which I'll 
call the ordinary-ordinary (O — O) ray, or look at its extraordinary ray, I have the 
ordinary-extraordinary (O — E) ray. And then the extraordinary ray from the first 
one comes out as the & — O ray, and then there’s an EF — E ray, alright. Now you 
can ask what happens. 


You'll find the following. When a photon comes in, you always find that only 
one of the four counters goes off. 


If the photon is O from the first calcite, then the second calcite gives O—O with 
probability cos*¢ or O — E with the complementary probability 1—cos?¢ = sin’ ¢. 
Likewise an E photon gives a E — O with the probability sin?¢ or an E — E with 
the probability cos?¢. 


11.8 Two-Photon Correlation Experiment 

Let us turn now to the two photon correlation experiment (see Figure 11.4). What 
can happen is that an atom emits two photons in opposite direction (e.g., the 
3s + 2p —> 1s transition in the H atom). They are observed simultaneously (say, 
by you and by me) through two calcites set at ¢, and ¢» to the vertical. Quantum 


theory and experiment agree that the probability Poo that both of us detect an 
ordinary photon is 


Poo = 3008" — 1) 

The probability Per that we both observe an extraordinary ray is the same 
Perr = 5008"( 3 — fr) 

The probability Por that I find O and you find E is 


Por = ssin’($s — 1) 
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and finally the probability Pgo that I measure E and you measure O is 
l a) 
PEo = 3 on (G2 — dr) 


Notice that you can always predict, from your own measurement, what I shall get, 
O or E. For any axis ¢; that I chose, just set your axis ¢» to ¢; then 


Por = Peo = 0 


and I must get whatever you get. 


Let us see now how it would have to be for a local probabilistic computer. Photon 
1 must be in some condition a with the probability f.(@,) that determines it to 
go through as an ordinary ray [the probability it would pass as EF is 1 — fa(¢1)]. 
Likewise photon 2 will be in a condition § with probability gg(¢2). If pag is the 
conjoint probability to find the condition pair a, @ the probability Poo that both 
of us observe O rays is 


Poo(¢, $2) = >_ pas fa($1)ga(42) 2 Pas = 
as 


likewise 


Por(¢1,¢2) = ¥— pas(1— fa(dr))ga(¢2) ete. 
faze} 


The conditions a determine how the photons go. There’s some kind of correlation 
of the conditions. Such a formula cannot reproduce the quantum results above for 
any Das, fa(¢i), 9a(¢2) if they are real probabilities — that is all positive, although 
it is easy if they are “probabilities” — negative for some conditions or angles. We 
now analyze why that is so. 


I don’t know what kinds of conditions they are, but for any condition the prob- 
ability fa(@) of its being extraordinary or ordinary in any direction must be either 
one or zero. Otherwise you couldn’t predict it on the other side. You would be 
unable to predict with certainty what I was going to get, unless, every time the 
photon comes here, which way it’s going to go is absolutely determined. Therefore, 
whatever condition the photon is in, there is some hidden inside variable that’s 
going to determine whether it’s going to be ordinary or extraordinary. This deter- 
mination is done deterministically, not probabilistically; otherwise we can’t explain 
the fact that you could predict what I was going to get exactly. So let us suppose 
that something like this happens. Suppose we discuss results just for angles which 
are multiples of 30°. 


On each diagram (Figure 11.5) are the angles 0°, 30°, 60°, 90°, 120°, and 150°. 
A particle comes out to me, and it’s in some sort of state, so what it’s going to give 
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- Cnt, et of photows 


way photon (-) your photon £7) 


Fig. 11.5. 


for 0°, for 30°, etc. are all predicted — determined — by the state. Let us say that 
in a particular state that is set up the prediction for 0° is that it'll be extraordinary 
(black dot), for 30° it’s also extraordinary, for 60° it’s ordinary (white dot), and so 
on (Figure 11.5a). By the way, the outcomes are complements of each other at right 
angles, because, remember, it’s always either extraordinary or ordinary; so if you 
turn 90°, what used to be an ordinary ray becomes the extraordinary ray. Therefore, 
whatever condition it’s in, it has some predictive pattern in which you either have 
a prediction of ordinary or of extraordinary — three and three — because at right 
angles they’re not the same color. Likewise the particle that comes to you when 
they’re separated must have the same pattern because you can determine what I’m 
going to get by measuring yours. Whatever circumstances come out, the patterns 
must be the same. So, if I want to know, “Am I going to get white at 60°?” You 
just measure at 60°, and you'll find white, and therefore you'll predict white, or 
ordinary, for me. Now each time we do the experiment the pattern may not be the 
same. Every time we make a pair of photons, repeating this experiment again and 
again, it doesn’t have to be the same as Figure 11.5a. Let’s assume that the next 
time the experiment my photon will be O or E for each angle as in Figure 11.5c. 
Then your pattern looks like Figure 11.5d. But whatever it is, your pattern has to 
be my pattern exactly — otherwise you couldn’t predict what I was going to get 
exactly by measuring the corresponding angle. And so on. Each time we do the 
experiment, we get different patterns; and it’s easy: There are just six dots and 
three of them are white, and you chase them around different way — everything 
can happen. If we measure at the same angle, we always find that with this kind 
of arrangement we would get the same result. 


Now suppose we measure at ¢2 — ¢;=30°, and ask, “With what probability do 
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we get the same result?” Let’s first try this example here (Figure 11.5a, 11.5b). 
With what probability would we get the same result, that they’re both white, or 
they’re both black? The thing comes out like this: Suppose I say, “After they come 
out, I’m going to choose a direction at random, I tell you to measure 30° to the right 
of that direction.” Then whatever I get, you would get something different if the 
neighbors were different. (We would get the same if the neighbors were the same.) 
What is the chance that you get the same result as me? The chance is the number 
of times that the neighbor is the same color. If you'll think a minute, you'll find that 
two thirds of the time, in the case of Figure 11.5a, it’s the same color. The worst 
case would be black /white/black/white/ black/white, and there the probability of a 
match would be zero (Figure 11.5c, 11.5d). If you look at all eight possible distinct 
cases, you'll find that the biggest possible answer is two-thirds. You cannot arrange, 
in a classical kind of method like this, that the probability of agreement at 30° will 
be bigger than two-thirds. But the quantum mechanical formula predicts cos?30° 
(or 3/4) — and experiments agree with this — and therein lies the difficulty. That’s 
all. That’s the difficulty. That’s why quantum mechanics can’t seem to be imitable 
by a local classical computer. 


I’ve entertained myself always by squeezing the difficulty of quantum mechanics 
into a smaller and smaller place, so as to get more and more worried about this 
particular item. It seems to be almost ridiculous that you can squeeze it to a 
numerical question that one thing is bigger than another. But there you are — 
it is bigger than any logical argument can produce, if you have this kind of logic. 
Now, we say “this kind of logic”; what other possibilities are there? Perhaps there 
may be no possibilities, but perhaps there are. It’s interesting to try to discuss 
the possibilities. I mentioned something about the possibility of time — of things 
being affected not just by the past, but also by the future, and therefore that our 
probabilities are in some sense “illusory.” We only have the information from the 
past, and we try to predict the next step, but in reality it depends upon the near 
future which we can’t get at, or something like that. A very interesting question 
is the origin of the probabilities in quantum mechanics. Another way of putting 
things is this: We have an illusion that we can do any experiment that we want. We 
all, however, come from the same universe, have evolved with it, and don’t really 
have any “real” freedom. For we obey certain laws and have come from a certain 
past. Is it somehow that we are correlated to the experiments that we do, so that 
the apparent probabilities don’t look like they ought to look if you assume that they 
are random. There are all kinds of questions like this, and what I’m trying to do is 
to get you people who think about computer-simulation possibilities to pay a great 
deal of attention to this, to digest as well as possible the real answers of quantum 
mechanics, and see if you can’t invent a different point of view than the physicists 
have had to invent to describe this. In fact the physicists have no good point of 
view. Somebody mumbled something about a many-world picture, and that many- 
world picture says that the wave function ~ is what’s real, and damn the torpedos if 
there are so many variables, N®. All these different worlds and every arrangement 


SIMULATING PHYSICS WITH COMPUTERS 151 


of configurations are all there just like our arrangement of configurations, we just 
happen to be sitting in this one. It’s possible, but I’m not very happy with it. 


So, I would like to see if there’s some other way out, and I want to emphasize, 
or bring the question here, because the discovery of computers and the thinking 
about computers has turned out to be extremely useful in many branches of human 
reasoning. For instance, we never really understood how lousy our understand- 
ing of languages was, the theory of grammar and all that stuff, until we tried to 
make a computer which would be able to understand language. We tried to learn a 
great deal about psychology by trying to understand how computers work. There 
are interesting philosophical questions about reasoning, and relationship, observa- 
tion, and measurement and so on, which computers have stimulated us to think 
about anew, with new types of thinking. And all I was doing was hoping that the 
computer-type of thinking would give us some new ideas, if any are really needed. 
I don’t know, maybe physics is absolutely okay the way it is. The program that 
Fredkin is always pushing, about trying to find a computer simulation of physics, 
seem to me to be an excellent program to follow out. He and I have had won- 
derful, intense, and interminable arguments, and my argument is always that the 
real use of it would be with quantum mechanics. and therefore full attention and 
acceptance of the quantum mechanical phenomena — the challenge of explaining 
quantum mechanical phenomena — has to be put into the argument, and therefore 
these phenomena have to be understood very well in analyzing the situation. And 
I'm not happy with all the analyses that go with just the classical theory, because 
Nature isn’t classical, dammit, and if you want to make a simulation of Nature, 
you'd better make it quantum mechanical, and by golly it’s a wonderful problem, 
because it doesn’t look so easy. Thank you. 


11.9 Discussion 


Question: Just to interpret, you spoke first of the probability of A given B, versus 
the probability of A and B jointly — that’s the probability of one observer seeing 
the result, assigning a probability to the other; and then you brought up the paradox 
of the quantum mechanical result being 3/4, and this being 2/3. Are those really 
the same probabilities? Isn’t one a joint probability, and the other a conditional 
one? 


Answer: No, they are the same. Poo is the joint probability that both you and 
I observe an ordinary ray, and Pre is the joint probability for two extraordinary 
rays. The probability that our observations match is 


Poo + Peg = cos?30° = 3/4 
Question: Does it in some sense depend upon an assumption as to how much 


information is accessible from the photon, or from the particle? And second, to 
take your question of prediction, your comment about predicting, is in some sense 
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reminiscent of the philosophical question, “Is there any meaning to the question of 
whether there is free will or predestination?” , namely, the correlation between the 
observer and the experiment, and the question there is, “Is it possible to construct 
a test in which the prediction could be reported to the observer?” , or instead, has 
the ability to represent information already been used up? And I suspect that you 
may have already used up all the information so that prediction lies outside the 
range of the theory. 


Answer: All these things I don’t understand; deep questions, profound questions. 
However physicists have a kind of a dopy way of avoiding all of these things. They 
simply say, now look, friend, you take a pair of counters and you put them on 
the side of your calcite and you count how many times you get this stuff, and it 
comes out 75% of the time. Then you go and you say, “Now can I imitate that 
with a device which is going to produce the same results, and which will operate 
locally”, and you try to invent some kind of way of doing that, and if you do it 
in the ordinary way of thinking, you find that you can’t get there with the same 
probability. Therefore some new kind of thinking is necessary, but physicists, being 
kind of dull minded, only look at Nature, and don’t know how to think in these 
new ways. 


Question: At the beginning of your talk, you talked about discretizing various 
things in order to go about doing a real computation of physics. And yet it seems 
to me that there are some differences between things like space and time, and 
probability that might exist at some place, or energy, or some field value. Do you 
see any reason to distinguish between quantization or discretizing of space and time, 
versus discretizing any of the specific parameters or values that might exist? 


Answer: I would like to make a few comments. You said quantizing or discretiz- 
ing. That’s very dangerous. Quantum theory and quantizing is a very specific type 
of theory. Discretizing is the right word. Quantizing is a different kind of math- 
ematics. If we talk about discretizing... of course I pointed out that we're going 
to have to change the laws of physics. Because the laws of physics as written now 
have, in the classical limit, a continuous variable everywhere, space and time. If, for 
example, in your theory you were going to have an electric field, then the electric 
field could not have (if it’s going to be imitable, computable by a finite number 
of elements) an infinite number of possible values, it’d have to be digitized. You 
might be able to get away with a theory by redescribing things without an electric 
field, but supposing for a moment that you've discovered that you can’t do that 
and you want to describe it with an electric field, then you would have to say that, 
for example, when fields are smaller than a certain amount, they aren’t there at all, 
or something. And those are very interesting problems, but unfortunately they’re 
not good problems for classical physics because if you take the example of a star 
a hundred light years away, and it makes a wave which comes to us, and it gets 
weaker, and weaker, and weaker, and weaker, the electric field’s going down, down, 
down, how low can we measure? You put a counter out there and you find “clunk,” 
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and nothing happens for a while, “clunk,” and nothing happens for a while. It’s not 
discretized at all, you never can measure such a tiny field, you don’t find a tiny field, 
you don’t have to imitate such a tiny field, because the world that you're trying 
to imitate, the physical world, is not the classical world, and it behaves differently. 
So the particular example of discretizing the electric field, is a problem which I 
would not see, as a physicist, as fundamentally difficult, because it will just mean 
that your field has gotten so small that I had better be using quantum mechanics 
anyway, and so you’ve got the wrong equations, and so you did the wrong problem! 
That’s how I would answer that. Because you see, if you would imagine that the 
electric field is coming out of some ’ones’ or something, the lowest you could get 
would be a full one, but that’s what we see, you get a full photon. AIl these things 
suggest that it’s really true, somehow, that the physical world is representable in a 
discretized way, because every time you get into a bind like this, you discover that 
the experiment does just what’s necessary to escape the trouble that would come if 
the electric field went to zero, or you’d never be able to see a star beyond a certain 
distance, because the field would have gotten below the number of digits that your 
world can carry. 
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QUANTUM ROBOTS 


Paul Benioff 


Abstract 


Validation of a presumably universal theory, such as quantum mechanics, requires a 
quantum mechanical description of systems that carry out theoretical calculations 
and sytems that carry out experiments. The description of quantum computers is 
under active development. No description of systems to carry out experiments has 
been given. A very small step in this direction is taken here by giving a descrip- 
tion of quantum robots as mobile systems with on board quantum computers that 
interact with environments of quantum systems. The dynamics of these systems 
are described in terms of tasks that consist of sequences of computation and action 
phases. For each task a step operator is defined which can then be used to define 
finite time interval step dynamics or a task Hamiltonian. A specific task carried 
out on a very simple environment is used to illustrate the models. 


12.1 Introduction 


Much of the impetus to study quantum computation, either as networks of quantum 
gates [1, 2] (See [3] for a review) or as Quantum Turing Machines [4-8], is based 
on the increased efficiency of quantum computers compared to classical computers 
for solving some important problems [9, 10]. Realization of this goal or use of 
quantum computers to simulate other physical systems (6, 11] requires the eventual 
physical construction of quantum computers. However, as emphasized repeatedly 
by Landauer [12], there are serious obstacles to such a physical realization. 


There is, however, another reason to study quantum computers that is less 
dependent on whether or not such machines are ever built. It is based on the fact 
that testing the validity of a physical theory such as quantum mechanics requires the 
comparison of numerical values calculated from theory with experimental results. If 
quantum mechanics is universally valid (and there is no reason to assume otherwise), 
then both the systems that carry out theoretical calculations and the systems that 
carry out experiments must be described within quantum mechanics. It follows 
that the systems that test the validity of quantum mechanics must be described by 
the same theory whose validity they are testing. That is, quantum mechanics must 
describe its own validation to the maximum extent possible [13]. 


Because of these self referential aspects, limitations in mathematical systems 
expressed by the Gddel theorems lead one to expect that there may be interesting 
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questions of self consistency and limitations in such a description. Limitations on 
self observation by quantum automata [14-16] may also play a role here. 


In order to investigate these questions it is necessary to have well defined com- 
pletely quantum mechanical descriptions of systems that compute theoretical values 
and of systems that carry out experiments. So far there has been much work on 
quantum computers. These are systems that can, in principle at least, carry out 
computation of theoretical values for comparison with experiment. However there 
has been no comparable development of a quantum mechanical description of robots. 
These are systems that can, in principle at least, carry out experiments. 


Another reason quantum robots are interesting is that it is possible that they 
might provide a very small first step towards a quantum mechanical description 
of systems that are aware of their environment, make decisions, are intelligent, 
and create theories such as quantum mechanics [17-19]. If quantum mechanics is 
universal, then these systems must also be described in quantum mechanics to the 
maximum extent possible. 


The main point of this paper is that quantum robots and their interactions 
with environments may provide a well defined platform for investigation of many 
interesting questions generated by the above considerations. To this end some 
general aspects of quantum robots and their interactions with environments are 
discussed in the next section. A quantum robot is defined as a mobile system 
consisting of an on board quantum computer and needed ancillary systems that 
moves in and interacts with an environment of quantum systems. The concept of 
tasks, as sequences of computations and actions, carried out by quantum robots is 
also introduced. 


Section 12.3 provides a more detailed description of quantum robots and gives a 
dynamical model for quantum robots interacting with environments. The on board 
computer is taken to be a quantum Turing machine consisting of a multistate head 
moving on a closed lattice of qubits and ancillary output, memory and control 
systems. The dynamics is defined in terms of step operators that are a sum of 
computation and action phase step operators. These can be used to describe finite 
time interval step dynamics or used to construct a Hamiltonian based on Feynman’s 
prescription [20]. Locality and other conditions that the step operators must satisfy 
are discussed in detail. 


A specific example of a task for a quantum robot in a very simple example of 
an environment is analyzed in detail in Section 12.4. The environment consists of a 
single spinless particle on a 1-D space lattice and the task is “search to the right for 
the particle, if found bring back to the initial quantum robot location”. Detailed 
properties of the action and computation phases are given with some mathematical 
aspects discussed in the Appendix. 


The last section contains a discussion of additional aspects. The importance of 
having a well defined quantum mechanical platform for asking relevant questions is 
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amplified. Also the speculative possibility of a Church-Turing type hypothesis for 
the class of physical experiments is noted. 


It must be emphasized that the language used in this paper to describe quantum 
robots is carefully chosen to avoid any suggestions that these systems are aware of 
their environment, make decisions, carry out experiments or make measurements, 
or have other properties characteristic of intelligent or conscious systems. The 
quantum robots described here have no awareness of their environment and do not 
make decisions or measurements. Their description differs in detail only, from that 
used to describe any other system in quantum mechanics. 


Some aspects of the ideas presented here have already occurred in earlier work. 
Physical operations have been described as instructions for well-defined realizable 
and reproducible procedures [21], and quantum state preparation and observation 
procedures have been described as instruction booklets or programs for robots [22]. 
However these concepts were not described in detail and the possibility of describing 
these procedures or operations quantum mechanically was not mentioned. Also 
quantum computers had not yet been described. More recently Helon and Milburn 
(23] have described the use of the electronic states of ions in a linear ion trap as an 
apparatus (and a quantum computer register) to measure properties of vibrational 
states of the ions. In other work quantum mechanical Maxwell’s demons have been 
described [24]. 


Also there is much work on the interactions between quantum computers and 
the environment. However, these interactions are considered as a source of noise or 
errors to be minimized or corrected by use of quantum error correction codes [25]. 
Here interactions between a quantum robot and the environment are emphasized 
as an essential part of the overall system dynamics. Other work on environmentally 
induced superselection rules (26, 27] also emphasizes interactions between the envi- 
ronment and a measurement apparatus that stabilize a selected basis (the pointer 
basis) of states of the apparatus. 


12.2 Quantum Robots 


Here quantum robots are considered to be mobile systems that have a quantum 
computer on board and any other needed ancillary systems. Quantum robots move 
in and interact (locally) with environments of quantum systems. Since quantum 
robots are mobile, they are limited to be quantum systems with finite numbers of 
degrees of freedom. 


Environments consist of arbitrary numbers and type of systems moving in 1-, 
2-, or 3-dimensional spatial universes. The component systems can have spin or 
other internal quantum numbers and can interact with one another or be free. En- 
vironments can be open or closed. If they are open then there may be systems that 
remain for all time outside the domain of interaction with the quantum robot that 
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can interact with and establish correlations with other environment systems in the 
domain on the robot. Quantum field theory may be useful to describe environments 
containing an infinite number of degrees of freedom. To keep things simple, in this 
paper environments will be considered to consist of systems in discrete space lattices 
instead of in continuous space. 


The quantum computer that is on board the quantum robot can be described 
aS a quantum Turing machine, a network of quantum gates, or any other suitable 
model. If it is a quantum Turing machine, it consists of a finite state head moving 
on a finite lattice of qubits. The lattice can have distinct ends. However it seems 
preferable if the lattice is closed (i.e. cyclic). If the on board computer is a network 
of quantum gates then it should be a cyclic network with many closed internal 
quantum wire loops and a limited number of open input and output quantum wires 
(narrow bandwidth). Even though acyclic networks are sufficient for the purposes 
of quantum computation [28] cyclic ones are preferable for quantum robots. One 
reason is that interactions between these networks and the environment are simpler 
to describe and understand than those containing a large number of input and 
output lines. Also the only known examples of very complex systems that are 
aware of their environment and are presumably intelligent, contain large numbers 
of internal loops and internal memory storage. 


The overall dynamics of a quantum robot and its interactions with the envi- 
ronment is described in terms of tasks. Tasks can be divided into different types 
according to their goals. For one type the goals can be described in terms of spec- 
ified changes in the state of the environmental systems. This type is similar to 
the association of functions with quantum computers in that a quantum computer, 
starting with a specified initial state containing the function argument, outputs a 
final state with the value of the function. 


Another type has as a goal the carrying out of a measurement by transfer of 
information from the environment to the quantum robot. Some tasks may combine 
both types of goals. Other types of goals may also be possible. 


An example of the first type of task is “move each system in region F 3 sites 
to the right if and only if the destination site is unoccupied.” Implementation of 
such a task requires specification of a path to be taken by the quantum robot in 
executing the task. Some method of determining when it is inside or outside of 
the specified region and making appropriate movements must be available. In this 
case if there are n systems in region RF at locations 2), 72,::+ , 2, in region A then 
the initial state of the regional environment, |z) = @j_,|z;) becomes @j_,|zj + 3) 
provided all destination sites are unoccupied. 


If the initial state of the regional environment is a linear superposition of states 
~ = >". Czlz) of n-system position states |z) in R then the final state of the regional 
environment is given in general by a density operator even if all destination sites 
are unoccupied. This is a consequence of the fact that in general the actions of the 
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quantum robot introduce correlations between the states of the robot systems and 
the different initial environment component states |z). When the task is completed 
on all components |x), the overall state of the robot plus environment is given by a 
linear sum over robot regional environment states of the form >), cz6z|z). Here 6, 
is the final state of the quantum robot resulting from carrying out the task on the 
regional environment in state |z). Taking the trace over the robot system variables 
gives the density operator form for the regional environment state. 


The above description shows that quantum robots can carry out the same task 
on many different environments simultaneously. This can be done by use of an 
initial state of the quantum robot plus environment that is a linear superposition 
of different environment basis states. For quantum computers the corresponding 
property of carrying out many computations in parallel has been known for some 
time [6]. Whether the speedup provided by this parallel tasking ability can be 
preserved for some tasks, as is the case for Shor’s [9] or Grover’s algorithms [10] for 
quantum computers, remains to be seen. 


The above described task is an example of a reversible task. There are also 
many tasks that are irreversible. An example is the task “clean up the region R of 
the environment” where “clean up” has some specific description such as “move all 
systems in A to some fixed pattern”. This task is irreversible because many initial 
states of systems in FR are taken into the same final state. This task can be made 
reversible by storing somewhere in the environment outside of R a copy of each 
component in some basis B of the initial state of the systems in R. For example if 
B = {|z)} and w = Do, cz|z) is the initial state, then the copy operation is given 
by 2i_ Ca2lZ)|0)cp —> diz CzlZ)|Z)cp- 

This operation of copying relative to the states in some basis avoids the lim- 
itations imposed by the no-cloning theorem [29] because an unknown state w is 
not being copied. The price paid is that copying relative to some basis introduces 
branching into the process in that correlations are introduced between the state 
of systems in the copy region and states of systems in R. This is the quantum 
mechanical equivalent of the classical case of making a calculation of a many-one 
function reversible by copying and storing the input [30]. 


In the above case carrying out the cleanup on the state ae Cz|Z)|Z)cp corre- 
sponds to the operation }/, ¢z|Z)|Z)cp — ly) 2, Cz|Z)ep where |y) is the clean up 
state for the region R. The overall process is reversible as it can be described by 
the transformation }°, cz|z)|Q)cp —> ly) 0, CzlZ)cp- If the final state of the quan- 
tum robot depends on the initial state of the systems in region R, then correlations 
remain and the overall transformation corresponding to carrying out the cleanup 
task is given by }/, Cz|)|Q)cp9: — ly) d), CzlZ)cp9z. Here 0; and @, are the initial 
and final states of the quantum robot. 


Each task is considered here to consist of a sequence of computation and action 
phases. The purpose of each computation phase is to determine what action the 
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quantum robot should take in the following action phase. Information on the state of 
local environmental systems may also be recorded. The input to the computation 
includes the local state of the environment and any other pertinent information, 
such as the output of the previous computation phase. During a computation phase 
the robot does not move. Changes in the state of the environment are limited to 
those resulting from observation states of an on board ancillary system, the output 
system (0) which is also changed. These states determine the action taken following 
completion of the computation. 


During each action phase the the environment state in the neighborhood of the 
quantum robot is changed or the quantum robot moves on the lattice. Neither or 
both of these can also occur. Depending on the model used, each action phase can 
consist of one step or several steps. Here one step consists of the robot moving to 
at most an adjacent lattice site, or the local environment state changing, or both. 
During an action phase the state of the (0) system, which determines the action to 
be carried out, and the state of the on board quantum computer, is not changed. 
Also the quantum robot may or may not observe the local environment. Examples 
of actions that do not and do require observations are “rotate the qubit (as a spin 
system) by an angle ¢” and “rotate the qubit by an angle ¢ only if it is in state |0). 
If the qubit is in state |1) move to an adjacent site.” 


The description of tasks carried out by quantum robots requires the use of 
completion or halting flags to determine when individual action and computation 
phases are completed as well as when the overall task is completed. Such flags are 
necessary if the overall quantum robot plus environment dynamics is described by 
a Hamiltonian because the unitarity of e~*#* requires that system motion occurs 
somewhere even after the task is completed. 


Note that there are many examples of tasks that never halt. Nonhalting of tasks 
can arise from several sources. The task may consist of a nonterminating sequence 
of computation and action phases. Or either a computation of an action phase may 
never halt. An example of an action that is multistep, does not halt, and requires 
local environment interactions at each step is “search along a path on a space lattice 
until a particle is found” where the path contains no particles at all. 


As is well known, there are many ways to define local interactions. In the 
interests of simplicity, a very local delta function interaction will be assumed in 
the following. That is the local environment for the quantum robot is limited to 
the environment at the quantum robot location. The same assumption is made for 


the head moving on the qubit lattice in the model used for the on board quantum 
computer. 
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Fig. 12.1. A Schematic Model of a Quantum Robot and its Environment. The environment 
is a 3-D space lattice containing various types of quantum systems (not shown). The 
quantum robot QR consists of an on board Quantum Turing machine, finite state memory 
(m) and output (0) systems, and a control qubit (c). The on board QTM consists of a 
finite closed lattice C2 of qubits and a finite state head h2 that moves on £2. The location 
(q) of a marker qubit is shown. The position z = x,y,z of the quantum robot hf; on the 
environment lattice is shown by an arrow. 


12.3. Models of Quantum Robots plus Environments 


Here a model of quantum robots plus environments is described that illustrates the 
above material. To keep things simple the model will be limited to a description of 
information bearing degrees of freedom only. The relevance of this for the develop- 
ment of quantum computers has been noted by Landauer [31]. Also space will be 
considered to be discrete rather than continuous. 


As noted quantum robots (QR)s consist of an on board quantum computer 
and ancillary systems. Here the on board quantum computer will be taken to be 
a quantum Turing machine consisting of a finite state head moving on a closed 
circular lattice Ly of N+1 qubits. The states of N qubits are used for computation 
and one qubit, taken to be ternary, is used for a marker. Ancillary systems present 
are an output system (0), a memory system (m), and a control qubit (c). Both (0) 
and (m) are described by quantum states in finite dimensional Hilbert spaces. 


Simplifying assumptions for the environment include the use of discrete instead 
of continuous space. As a result environments of quantum robots consist of 1,2,3-D 
space lattices containing arbitrary numbers of different types of systems. Simple 
examples of environments consist of a 1-D lattice of qubits (which is a quantum 
register) and a 1-D lattice containing just one spinless particle. The latter example 
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will be used to discuss a specific example of a task. Figure 12.1 shows a quantum 
robot in a 3-D space lattice environment. Environment systems external to the 
quantum robot are not shown. The location of the quantum robot in the lattice is 
shown by an arrow. 


Another simplifying assumption is that the only changes in the states of the 
environmental systems occur as a result of interacting with the quantum robot. 
The states are stationary in the absence of this interaction. This is done to avoid 
complications in describing task dynamics for environments of moving interacting 
systems. It is hoped to remove this restrictive assumption in future work. 


To each task is associated a step operator Tgp that is used to describe the task 
dynamics. Single task steps in the forward or backward time directions are described 
respectively by Tor or TE, rp: If single task steps occur in a finite time interval ¢, then 
Tgp is required to be unitary with Tor = e~*”* for some Hamiltonian H [6, 7, 32]. 
If infinitesimal time intervals are associated with Tgr, then Tar can be used to 
directly construct a Hamiltonian according to [20]: 


H = K(2—Tor-Tp) (12.1) 


where K is an arbitrary constant. In this model which has been used elsewhere [5, 8], 
Tgp need not be unitary or even normal (TorTé RF ipl rar is possible). 


Since each task consists of a sequence of computation and action phases, Tor 
can be written as a sum of computation and action phase operators 


where T, and J; describe respectively single steps of action and computation phases 
of the quantum robot. 


A computation phase (TJ; active) accepts as input the states of (o) and (m) and 
the local environment. The computation, which is in general multistep, determines 
new states of (o) and (m) as output. These states determine the action to be carried 
out in the next action phase. During a computation phase there is no change in the 
environment state (other than that resulting from local observation) or the location 
of the quantum robot. 


An action phase (T, active) accepts as input the states of (0) and possibly (m). 
Actions include motion of the quantum robot and local changes of the environment 
state. They may be single step or multistep and may or may not require local 
observation of the environment. The states of (0), (m), and the on board quantum 
computer are not changed. An example of an action that does not require obser- 
vation is “move one site in the +y direction”. An example requiring observation is 
“if spin 1/2 system is at the QR location rotate spin by 0, else move 1 site in -z 
direction”. 


The function of the control qubit (c) is to regulate which phase type is active. 
In particular T, or T, is active if (c) is in state |0) or |1). The last step, or iteration 
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of J, or J,, of the computation or action phase includes the respective change 
|O). — |1)¢ or |1). > |O)c. 


The conditions that J, and T, must satisfy can be expressed in terms of prop- 
erties of these operators relative to a reference basis for the quantum robot and 
environment. To this end let the reference basis be given by 


Ba= {|m, k, t)\€1)0]£2) mii) clz)qr|E)}- 


Here |m, k, t) denotes the state of the on board quantum Turing machine with 
lm) and |k) the respective internal state and Lo location of the head hy, |t) = 
@i"11|\t,) is the state of the qubits on Ly with t; = 0,1 for N qubits and t, = 0, 1,2 
for the marker qubit. The state |z)gr = |x, y, z)orR gives the lattice site location of 
the quantum robot, denoted by the arrow in Figure 12.1. The state |Z) denotes a 
basis state in some basis Br of environment states. 


In the following all environmental observations carried out by J; are assumed 
to commute, with Br a common eigenbasis for the observations. In this case the 
action of T, depends on but does not change the states |). This, along with the 
requirement that T, not change the QR location, gives 


Te = > PreTePa,eP§ (12.3) 
£,E 


where Pz; f = |g, E)(z, E| is the projection operator for the QR at site z and P& 
is the projection operator for the control system in state |0). This equation shows 
that T, is diagonal in states |x, EZ). The action of T, on states that are linear 
superpositions of the basis states |z, &) will in general introduce entanglements 
between these states and states of the quantum computer. The presence of Py 
shows that T; is inactive if the control qubit is in state |1). 


Locality conditions for THE acting on the component systems of the quantum 
robot are given by 


; ; 
(mn! LTP lm,k,) = 0it | EF Lato e (12.4) 
These conditions express requirements that single step changes in the on board 
quantum computer limit the head hy, motion to at most one L£» site in either direction 
and that changes in the lattice qubit state are limited to the qubit at the head 
location. Here (zg, E|T.|z, £) = BE is the computation phase operator for the 
quantum robot at site z and the environment in state |), Eq. 12.3. 


An additional condition is that the operator Te” depends on loca] environmen- 
tal conditions only. That is if |E) and |’) are two environment states that are the 
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same at site x but may differ at other locations then 


TE? = Tp (12.5) 


To understand this condition better, suppose that the environment consists of 
nm systems each with internal degrees of freedom described by states |f) with f 
taking on M possible values. For qubits M = 2 with f = 0,1. If the systems 
are distinguishable, a convenient basis representation for the states |) is |E) = 
@7_1|£2, fe). If the systems are considered as bosons or fermions, then these states 
must be given the appropriate symmetry. 


If one system, the @th, is at site and the rest are elsewhere, then |E) = 
|Ex)|E¢z) where |Ez) = |Ze, fe) and |E4z) = IZ, fi), ae |ve—-1, fe-1), |Ze41, fe+i)> 
+++ |Z, fn). If systems £;, £2,--- , 2m with m <n are at site g and the rest else- 
where, then |Hz) = ®7,|z, fe,)e, and Ezz) = @n|&p, fr)h where ®, is taken over 
all n — m systems for which zs, # =. 


Using this notation the locality condition, Eq. 12.5 can be written as 
(z, E\T.|z, B) = (Ez|Te|Ez) (12.6) 


Here |E) = |Ez)|Ezz) and (z|z) = (Eyz|E4z) = 1 have been used. Note that 
(z, E|T,|z, EF) = T° if no environmental systems are at site z. This is a valid step 
operator that defines the computation phase steps if no environmental systems are 
at the location of the quantum robot. This will be made use of later in discussing 
a specific example. 


Note that (F,|T,|E£z) can be very complex as it can depend on all the m variables 
fe.,°'* » fe, as well as on which of the n systems are at site z. If the n environmental 
systems are all identical fermions, then the complexity is much reduced because for 
each value of f at most one system can be in the state |z, f). If T, interacts with at 
most one environmental system then (£,|T,|Ez) represents at most M + 1 distinct 
computation phase operators, one for each value of f and one if no systems are 
at the QR location. If T, can interact with more than one environmental system 
then more combinations are posible. If the environmental systems are identical 
bosons, then T, may have a more complex dependence on the local environment as 
an arbitrary number of systems in the same internal state can be present at the QR 
location and TJ, may depend on the number of systems present. 


Much of the above discussion also applies to the action phase operator T,. This 
operator depends on but does not change the states of (0) and (m) relative to some 
basis. This condition can be expressed by an equation similar to Eq. 12.3: 


ToS Pe lok ee (12.7) 
£; £2 


where P7’;” is the projection operator for (0) in state |¢,) and (m) in state |¢)) and 
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Pf is the projection operator for (c) in state |1). This shows that T, is diagonal in 
the states |2:),|22)m and is inactive when (c) is in state |0). 


The operator T, also satisfies locality conditions similar to those for T,, Eq. 12.4: 


JE") # |E) at sites # x, x 


plies bo ae fs 
(x B'|TQ*?|ZE) = Oif lz’ -z|>1 


(12.8) 
where T{1"2 = (£;£2|Ta|é:1£2). These conditions express the facts that during an 
action phase changes in the environment state are limited to sites z', x and that 
the quantum robot moves at most one site in any direction. The states |m, k, t) of 
the on board quantum computer are suppressed in the above as T, is the identity 
operator in the supspace spanned by these states. 


The requirement that the action is independent of states of environmental sys- 


tems distant from the quantum robot can be expressed in a fashion similar to that 
for T-: 


(gl ETE eB) = (Eeey g\Exat 2) (a Eg o\Ta'” eB ee) (12.9) 


for all z',z such that |z’ — z| < 1. The states |, ,) and |Eyz',,) describe the 
environment at sites z’,z and elsewhere. The definition of these states is similar 
to that given earlier for |E,) and |Ez,). Also |E) = |Ez2)|Ez2',2) has been used. 
The matrix element (EL, ,|E4z',2) = 1 if and only if |E') = |E) at sites # x, 2’. 
Otherwise it equals 0. 


The right-hand matrix element expresses the condition that one action phase 
step can change the environment at most at the initial and final locations of the 
quantum robot. If environmental changes at other locations are allowed then 
Eqs. 12.8 and 12.9 must be changed accordingly. 


Several additional aspects of the properties of T, and T, need to be noted. One is 
that to avoid complications, the need for history recording has not been discussed. 
Both the computation and action phases may need to record some history. For 
example when T;, is active, the change |£2)o|21)m —> |£5)o|@:)m requires history 
recording if the change is not reversible. Where records are stored (on board the 
quantum computer or in the environment) depends on the model. Also the task car- 
ried out by the quantum robot may not be reversible unless the initial environment 
is copied or recovered. 


Initial and final states for the starting and completion of tasks may be needed. 
For example at the outset the memory, output, and control systems might be in 
the state |0)m|2;)o|0)- and the environment would be in some suitable initial state. 
The process begins with the on board quantum computer active. 


Completion of a task could be described by designating one or more states |¢ +) 
as final output states and arranging matters so that motion of some type occurs 
that does not destroy the relevant parts of the final task state. This ballast motion 
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can occur on board the quantum computer or consist of motion of the quantum 
computer or some other system along a path in the environment without chang- 
ing the environmental state. If the ballast motion occurs on board the quantum 
computer and it is described by states in a finite dimensional Hilbert space, the 
stability of the final task state lasts for a finite time only before the task is undone. 
Because the evolution operator e~?#/7 is unitary, continued motion of some type is 
necessary. 


The conditions given above for T, and T, are sufficiently general to allow for 
branching tasks with states describing entangled activities. For example during a 
computation phase J, can take (o) and (m) states |2), 25) into linear superpositions 
yt ty Clr la (41, 22). Similarly the action of T, can take environment and QR position 
states |z, &) into linear superpositions ras Cz'g'|\z'E’). In this case the sum 
is limited to values of z'E’ that satisfy Eq. 12.8. Note that Eq. 12.7 is satisfied 
separately by each branch. 


Additional branching is possible if the action of ZT, or J, takes control qubit 
states into linear sums of |0) and |1). This allows for entanglements of action and 
computation phases. 


12.4 <A Specific Task in a Simple Environment 


Here a specific task in a very simple environment will be considered to illustrate some 
aspects of the models discussed above. The environment consists of one spinless 
particle (p) on a 1-D lattice. In this case a convenient environmental basis is given 
by the states |x) that denote the position of the particle on the lattice. 


The task considered is “search to the right for the particle. If it is found, bring 
it back to the initial location of the quantum robot.” This task consists of stepwise 
motion of the QR to the right, examining each successive site for the particle. If the 
particle is found, the QR returns to the initial location with the particle. It is clear 
that in order to carry out this task reversibly, the on board quantum computer must 
keep track of the number of sites searched. Reversibility also requires permanent 
recording of the distance between the QR and the particle, if found. 


The overall quantum robot plus environment state transformation resulting from 
carrying out the task can be represented as |j)Qr6(i)|z)p —> |7)QRO9(a — 3)\7)p 
provided the particle is found. Here |j)gr|x)p denote the respective lattice positions 
of the quantum robot and the particle, and 9(z) denotes the initial state of internal 
degrees of freedom of the on board quantum computer. The state 6(x — j) is the 
final state of the quantum computer with the distance from the quantum robot to 
the particle recorded in the memory. 


If the initial state is a linear superposition of QR and (p) position states the 
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Fig. 12.2. A Schematic Model of a Quantum Robot for the Specific Task on a 1-D 
Environment Space Lattice. The particle (p) is not shown. The other systems are as in 
Figure 12.1 except that the (m) systems is expanded into an N + 2 qubit lattice £3. The 
position of the quantum robot on the environment lattice is shown with an arrow. 


overall task transformation is given by 


/ 
Ui = So cy,2l9)qrit)p(t) => D_ cj2li) nls) p(x — 3) + dng (12.10) 
jit jit 

The prime on the sum means that it is limited to values of x — 7 such that 0 < 
z—j < 2% —1. For these values the QR will find the particle. What happens if 
xz —j is outside this range (the particle is not found) depends on model assumptions. 
The state Ynz represents the the task transformation if the particle is not found. 
The overall task transformation is reversible provided the states 9(d) are pairwise 
orthogonal for different values of d and are orthogonal to the initial state 0(¢). 


For carrying out this task the L. qubit lattice of the on board quantum Turing 
machine contains N + 2 qubits: N qubits are used for numbers 0,1,--- , 2” — 1, 
one qubit, which is ternary, is a marker, and the remaining qubit adjacent to the 
marker denotes the sign of the number (|1) ~ +, |0) ~ —). This lattice will be used 
as a short term memory to keep a running count of the number of sites the QR 
moves at each step. 


The memory system (m) is expanded to be another N + 2 qubit lattice C3 like 
Ly. It is used to record permanently the distance x — j between the initial location 


of the QR and (p). It corresponds to 6(x — j) in Eq. 12.10. Figure 12.2 shows the 
setup on a 1-D lattice environment. 
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There are three types of actions carried out in action phases for this task: search, 
return, and do nothing. Corresponding to these, the output system (o) has three 
internal states |sr),, |rt)o, |dn),. The search and return action phases carry out 
the transformations |j)qr|z)p > |j + 1)qr|z)p and |y)orii)p > 19 — Lori — 1)p. 
Do nothing is the identity operator on the QR and (p) position states. All these 
actions are single step and do not involve environment observations. 


The task begins with the number +0 on both on board lattices and (o) in state 
dn). and the computation phase active. If the particle (p) is at the QR location, 
the computation subtracts 1 from 0 on the running memory lattice £2 and does 
not change in the state of (0). If (p) is not at the location of QR, the computation 
phase adds 1 to the running memory and changes the (0) state to |sr),. In this case 
the subsequent action phase shifts the QR 1 site to the right and the computation 
phase becomes active again. 


This stepwise process of adding 1 to the number on the running memory with 
no change in the (0) state |sr), in the computation phase, and one site QR motion 
in the action phase continues until (p) is located. At this point the computation 
phase copies the number from running memory to the permanent memory qubit 
lattice £3, subtracts 1 from the running memory, and changes the (0) state to |rt) 9. 
The next action phase moves both the QR and (p) back one lattice site. 


This process continues until the number 0 appears on the running memory as 
part of the input to a computation phase. This computation subtracts 1 from the 
running memory and changes the state of (0) to |dn),. At this point the task is 
completed and the ballast phase begins. In the model described here ballast phase 
motion consists of repeated subtraction of 1 from the running memory with do 
nothing action phases. 


The task dynamics described above is shown schematically in Figure 12.3 as a 
decision tree. The round circles including “sr”, “rt”, and “dn” denote action phases. 
The square boxes between successive action phases, denote memory system states 
(d = running memory and st = permanent memory), and questions with answers 
based on local environmental states. The collection of boxes and arrows between 
successive actions shows what is done during each computation phase. The left 
hand column shows the dynamics during the search part of the task. The central 
column, with horizontal arrows only, shows changes made in memory states when 
the particle (p) is found, and the righthand column shows the dynamics during the 
return part of the task. The righthand row at the top shows progress during the 
ballast part of the task. The far righthand zero and that at the end of the search 
phase are explained below. 


The ballast motion continues until the number —(2” — 1) appears on the running 
memory. Here the model step operator J; is defined so that it annihilates the system 
state when it becomes active provided this number is in the running memory. Since 
the overall evolution is unitary with the Hamiltonian of Eq. 12.1, the effect of the 
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Fig. 12.3. Decision Tree for the Example Task. Task process motion is indicated by 
the arrows. Circles represent action phases. Square boxes show relevant states of systems. 
Permanent storage and running memory are shown respectively by “st” and “d”. The boxes 
between adjacent action phase circles show what occurs during a computation phase. The 
lefthand column shows task progress during the first search part. The center column with 
horizontal arrows shows what happens in a computation phase when (p) is first located. 
The righthand column shows task progress during the return part. The ballast activities 
that occur when the task is complete are shown in the upper right. 
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annihilation is to provide an infinite reflecting barrier at the quantum robot + 
environment path state where the annihilation occurs. The progress of the task 
under the evolution operator e~*#* can be considered as motion of an overall system 
wave packet of states on a path of states of the quantum robot and environment. 
The path is defined by successive iterations or Tgr and its adjoint on a suitable 
initial system state. 


The wave packet motion along the ballast part of the path continues until the 
barrier is reached. The effect of the barrier is to reflect the wave packet backwards 
along the path and to undo the task after all the ballast subtractions are undone. 
The length of time it takes before undoing the task begins, depends on the constant 
K and (exponentially) on the number N of qubits in the on board lattices. 


The same annihilation occurs if the particle (p) is intially either to the left of 
the quantum robot or at least 2% sites to the right (bottom of Figure 12.3). In 
this case annihilation occurs during the search part of the task when the positive 
number 2% — 1 is on the running memory lattice. The same reflection backward 
occurs with the search being undone if (p) is not found. 


The description of TgR is based on the requirement that Tgp be distinct path 
generating in some basis. This concept was described elsewhere [8] where it was 
applied to quantum computers. There is no reason why it should not also apply 
to models of quantum robots interacting with environments where the model is 
described by step operators as is done here. For this picture of task dynamics to 
apply to the task example, one must show that a task step operator Tgp exists that 
is distinct path generating at least on a subspace spanned by the set of suitable 
initial states. (That is each state in the set must be on a separate path that has no 
overlap with any other path.) This includes showing that Tgp is such that unitary 
evolution with the Hamiltonian of Eq. 12.1 generates the overall task transformation 
of Eq. 12.10. This will be done in the Appendix. 


One consequence of Tgp being distinct path generating in the reference basis is 
that there are no branchings or entanglements generated by Schrédinger evolution 
with the Hamiltonian of Eq. 12.1. However quantum effects are present in that the 
motion along the task state paths corresponds to motion of a quantum system on a 
1-D lattice on which infinite (or finite) reflecting barriers may be present [8]. Also 
any linear superposition present in the initial state will be preserved. This is shown 
by Eq. 12.10. 


12.5 Discussion 


It must be strongly emphasized that a main reason for studying quantum robots 
and their interactions with environments of quantum systems is that it provides a 
well defined platform for investigation of many interesting questions. For example 
“What properties must a quantum system have so that one can conclude that it is 
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aware of its environment, makes decisions, and has other properties of intelligence?” 
Answering such a question, even for models of quantum robots plus environments as 
defined here, seems difficult enough. Without any such model it seems impossible. 


The fact that the only known examples of intelligent quantum systems are very 
complex and contain the order of 107? degrees of freedom only emphasizes this 
point. A close study of simple systems introduced here may help to show exactly 
how such systems can be made more complex so that in some well defined limit (if 
a limit is needed) they become aware of their environment and intelligent. Without 
some such well defined platform it seems hopeless to try to answer such questions. 


It is also worthwhile to consider the following speculative ideas. The close con- 
nection between quantum computers and quantum robots interacting with environ- 
ments suggests that the class of all possible physical experiments may be amenable 
to characterization just as is done for the computable functions by the Church- 
Turing hypothesis [6, 33, 34]. That is there may be a similar hypothesis for the 
class of physical experiments. 


The description of tasks carried out by quantum robots (Section 12.2) lends 
support to this idea in that there may be an equivalent Church-Turing hypothesis for 
the collection of all tasks that can be carried out. The earlier work that characterizes 
physical proceedures as collections of instructions (21, 35], or state preparation and 
observation proceedures as instruction booklets or programs for robots [22] also 
supports this idea. On the other hand much work needs to be done to give a precise 
characterization of physical experiments, if such is indeed possible. 


Appendix 


The requirements that the computation phase step operator 7, must satisfy in order 
to carry out the example can be given as a set of 9 conditions. The conditions, which 
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follow are based on the task description and on the decision tree. 
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In these conditions only the relevant component system states are listed. Also 
the control qubit transformation |0), — |1), common to all the conditions has been 
left out. The state subscripts “st” and “ru” refer to the permanent storage and 
running memory qubit lattices, £3 and Ly respectively. The underlined number 
variables denote the qubit string equivalent of the underlined number. Here for 
both on board lattices increasing significance of the bits in the strings is in the 
counterclockwise direction from the marker qubit (Figure 12.2). The state of the 
most significant N + 1st qubit (adjacent to the marker in the clockwise direction) 
is the sign qubit |1) ~ + and |0) ~ —. The condition numbers above the arrows are 
present for discussion purposes only. 


Conditions 2 and 3 show the overall state changes during each computation phase 
in the search part of the task (the left hand column of Figure 12.3). Condition 4 
applies in case the particle is not found. Conditions 1 and 5 apply for a computation 
phase when the particle is first located at the quantum robot site during the search. 
Conditions 6 and 7 apply to the return part of the task (righthand column of 
Figure 12.3), and conditions 8 and 9 describe the computation phases during the 
ballast motion when the task is completed. Note that in condition 5 the value of d 
recorded in the storage lattice is the distance from (p) to the initial position of the 
quantum robot. 
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The arrow denotes that each computation phase consists of many steps or it- 
erations of T,. For example the locality conditions for the head hy motion on Ly 
and £3 mean that adding or subtracting 1 to the number recorded on the running 
memory lattice takes several steps. For example the change 211110 + 1 = 200001 
where 2 is the marker qubit and h» begins and ends at the marker takes at least 10 
steps. 


Conditions 3 and 4 show a different computation phase for the same input state 
depending on whether d < 2% — 1 or d= 2% —1. This choice does not require a 
complete search of Ly each time a search computation is carried out to determine 
which alternative applies. This follows from the fact that hz in moving past a string 
of 1s to find a 0 will encounter the marker before a 0 if and only if 2% —1 is on the 
“ru” lattice. 


A similar situation applies for the choice between conditions 8 and 9 except that 
here in carrying out the subtraction h» must each time move 1 qubit beyond the 
first 0 encountered to see if the 0 is or is not adjacent to the marker. If not condition 
8 applies, else condition 9 holds. The choice needed for conditions 6 and 7 is similar. 
Here hy moves 1 qubit beyond the first 1 encountered to see if it is adjacent to the 
marker. If not the 1 is changed to a 0 and all the prior passed Os change to Is. If 
the 1 is adjacent to the marker it is changed to a 0 and the qubit on the other side 
of the marker is changed to a 1. 


The conditions satisfied by the action phase operator TJ, are much simpler. They 
are 


Isr) ol)or|z)p > Isr) olJ a l)or|z)p 
Irt)olj)orii)p 7 Irt)oli — l)erli — 1)p 
ldn)lj)orlz)p + |dn)oli)oriz)p 


As was the case for the conditions for JT, only the relevant changes are included. 
the control qubit change |1), — |0), common to each condition was not included. 
Also T, is diagonal in the states of the on board quantum Turing machine. The 
conditions show explicitly how T, depends on but does not change the (0) system 


states. Each of these transformations corresponds to a single step or iteration of 
Te 


The above conditions and discussion are sufficient to see that Jor = T, + Tg is 
distinct path generating, at least on the subspace spanned by the basis states that 
are relevant to the task. The actions of Jgr on other basis states is not specified 
because they are not relevant to the task. The reason is that there will never be 
any overlap between these states and those that occur during carrying out of the 
task. Examples of these other states are those in which (0) is in state |dn) and a 
positive number is in the running memory. 
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QUANTUM INFORMATION THEORY 


Charles H. Bennett 


Abstract 


The new theory of coherent transmission and transformation of information in the 
form of intact quantum states represents a major extension and generalization of 
classical information and computation theory, and has a number of distinctive fea- 
tures, including dramatically faster algorithms for certain problems, more complex 
kinds of channel capacity and cryptography, and a second, quantifiable kind of 
information—entanglement—which interacts with classical information in phenom- 
ena such as quantum teleportation. 


13.1 Introduction 


Quantum mechanics has many paradoxes, but perhaps the greatest is the qualitative 
disparity between quantum laws and the macroscopic world. For most of the 20th 
century, physicists and chemists have used quantum mechanics to build an edifice 
(Fig. 13.1) of quantitative explanation and prediction covering almost all features of 
our everyday world: The rigidity of stone, the transparency of glass, the luminosity 
of the sun, among many other things. But if we look at the foundations of this 
structures we find a set of laws (Figs. 13.2,13.3) that, like the Ten Commandments, 
are marvelously concise but seem to bear almost no relation the way things work 
in everyday life. In order to see quantum effects like superposition or entanglement 
face to face, rather than through a hard-to-understand chain of indirection, one 
must go to a textbook or a laboratory. Aside from its intrinsic scientific interest, the 
burgeoning field of quantum information processing may have an important cultural 
side effect of bringing the formerly arcane foundations of quantum mechanics into 
popular awareness, much as the computer revolution has done with the formerly 
arcane foundations of computer science, notions such as computational universality 
which are in essence understood by anyone who goes into software shop and asks 
“Do you have a version of this that will run on my Mac?” 


The distinctive features of quantum data processing are sketched in Fig. 13.4. 
In classical computations we use bits to represent data, and if n bits are present 
we have 2” possible states. In quantum computation the bits are replaced by 
qgubits—quantum systems, such as polarized photons or spin-1/2 particles, with 
a 2-dimensional Hilbert space and capable of existing in a pair of orthogonal states 
identified with the Boolean values (for example + = |0) and t = |1)) as well as all 
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Fig. 13.1. 


superpositions. A set of n qubits can exist in 2” Boolean states and all possible 
superpositions, so that the state is represented by a vector, or more precisely a ray, 
in a 2” dimensional Hilbert space. Just as any transformation of classical data can 
be expressed as a sequence of simple gates (e.g. NOT and AND) acting on the 
bits one and two at a time, any transformation of quantum data can be expressed 
as an array of two-qubit controlled-NOT gates (also called CNOT or XOR) and 
one-qubit unitary rotations. The central feature of quantum data processing is 
the superposition principle: If a quantum gate is fed a superposition of inputs, 
it yields the corresponding superposition of outputs. As in a classical computer 
logic diagram, gates are interconnected by “wires”, representing whatever physical 
mechanism is use to coherently store or transmit a qubit from one gate operation 
to the next; but, unlike classical data, the data in a bundle of quantum wires is 
generally entangled—not expressible by separately specifying the state of each wire. 
A simple and important example of entanglement—an Einstein-Podolsky-Rosen 
(EPR) state of two qubits—arises as a necessary consequence of the superposition 
principle when a CNOT is applied to non-Boolean but unentangled data as in 
Fig. 13.4b (we often use shading to indicate entanglement between wires). Either 
of the entangled qubits alone appears completely random, yet together they are in 
a definite state. 


Because quantum mechanics encompasses classical, classical computations may 
be viewed as a subset of quantum computations, as real numbers are a subset of 
complex numbers. A classical bit may, without loss of generality, be viewed as a 
qubit promised to be in one of the states |0) or |1), and a classical wire is a wire 
that conducts these two states reliably, but introduces a random disturbance if it 
is asked to conduct a superposition. A classical wire may be viewed as a kind of 
noisy quantum channel, in which the incoming qubit interacts viaa CNOT with an 
ancillary qubit, which is then dumped into the environment (cf. Fig. 13.4c). The 
wire’s environment, in other words, makes a quantum nondemolition measurement 
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Fig. 13.2. 


in the |0),|1) basis, on the data passing through. In logic diagrams we use thick 
lines to represent classical wires and thin lines to represent quantum wires. 


13.2 Quantum Computing 


Interest in quantum information processing has centered on quantum computing— 
the possibility of using quantum operations in the intermediate stages of a com- 
putation to greatly speed up the solution of certain classically hard problems (cf 
Fig. 13.5). I am tempted to say that, as a topic of fundamental scientific inquiry, 
quantum computing in this sense is so close to finished as not to be interesting 
any more. Of course there are some practical details to be worked out, like ac- 
tually building a quantum computer, but the basic outlines are well understood: 
Some computations, such as integer factoring, can be sped up exponentially on a 
quantum computer [1], others, including many NP-type optimization and search 
problems, can be sped up quadratically [2, 3], while yet other problems cannot be 
sped up at all [4]. It is actually an exaggeration to declare quantum computation 
a dead topic, because a number of important theoretical questions remain, notably 
determining which other classically hard problems can be sped up by quantum com- 
puters. Prime candidates are graph isomorphism and computing the permanent of 
a matrix. Quantum computers have been shown capable of simulating the dynamics 
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Fig. 13.3. 


of a general quantum system given its Hamiltonian; for many systems no efficient 
classical algorithm for doing so is known. 


Practically, we are still far from building a quantum computer that would realize 
these speedups in a useful way. The goal of doing so looked quite unrealistic until the 
development, over the last two or three years, of efficient quantum error-correction 
techniques [5], which in principle allow computers made of unreliable parts to do 
arbitrarily long quantum computations reliably, if the error and decoherence per 
elementary gate operation can be made less than some threshold, estimated to 
be around 107? to 10—®. Quantum fault-tolerant computing may be viewed as a 
generalization of the fault-tolerant circuits that were developed in the early days of 
classical computing, but are scarcely needed nowadays owing to the high intrinsic 
reliability of today’s hardware. In the quantum realm, by contrast, the error and 
decoherence rates of today’s rudimentary quantum hardware is still several orders of 
magnitude too high for fault-tolerant techniques to take hold. Quantum computing 
is like controlled fusion—possible in principle, but maybe not practical for a long 


time to come. 
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Fig. 13.4. 


13.3 Transmission of Quantum Information 


Besides quantum computing per se, quantum information processing encompasses 
a range of other topics which broadly generalize classical communication and infor- 
mation theory. It is here that I think the most exciting theoretical work remains to 
be done. This is the wild west, or internet, of quantum information science. 


Problems in classical information transmission or communication involve sev- 
eral parties, e.g. a sender (“Alice”) and a receiver (“Bob”), and in cryptographic 
scenarios an eavesdropper (“Eve”). The parties have some informational task they 
wish to accomplish, which may be viewed as a successful transition from an initial 
state to a corresponding final state of the entire apparatus, for example an initial 
state in which Alice holds a bit string « chosen from some source distribution X, 
and a final state in which Bob holds x. The protocol used to achieve this goal 
should work with certainty, or with high probability, and it should be economical 
of communication resources, such as number of uses of a noiseless or noisy classical 
channel. In adversarial settings such as cryptography, the goal of one subset of the 
parties should be achievable despite the efforts of another subset; for example, Bob 
should learn x while Eve is prevented from gaining any information about z. 


Among the most important notions of classical information theory are source 
entropy and channel capacity and the related techniques of source and channel 
coding. These techniques make sources and channels fungible, in the sense that the 
number of channels asymptotically required to reliably communicate the output of 
a source depends only on the ratio of source entropy to channel capacity, and not 
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on other properties of the source or channel. 


When the notion of information is extended to include quantum states as well 
as classical data, the scope of communication theory expands correspondingly, with 
quantum analogs of source entropy and channel capacity as well as a new fungible 
resource, entanglement, which can interact with classical and quantum information 
in a variety of ways. 


If a quantum source emits states ~; with probability p;, its von Neumann en- 
tropy, S = —Trplog, p, where p = >); pi|¥iXyi|, determines the minimum asymp- 
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totic number of qubits to which its signals can be compressed by a quantum encoder 
and still faithfully recovered by a quantum decoder. This is the analog of classical 
data compression or source coding, by which redundant classical data is compressed 
and faithfully regenerated. But quantum data compression [6] differs in that it can 
be applied to non-orthogonal states (for example equiprobable horizontal and di- 
agonal photons, as shown in Fig. 13.6a) which would be spoiled if one tried to 
compress them classically. Also, because the states are non-orthogonal, the encoder 
cannot retain a copy of them, or indeed any memory of them, if they are to be 
faithfully reconstructed at the receiving end. This difference is a manifestation of 
the quantum no-cloning principle, according to which cloning a general quantum 
state, ie the mapping |) ® |0) > |») @ |), where w is an unknown state in a 2- 
or higher-dimensional Hilbert space, is non-physical and cannot be accomplished 
by any apparatus. By contrast the copying of classical data is easy, much to the 
consternation of software manufacturers. A quantum encoder is thus like a discreet 
telegrapher, who transmits other peoples’ messages without remembering them. 


Source coding removes redundancy so data can be sent more efficiently through 
a noiseless channel; error-correcting or channel coding, by contrast, introduces re- 
dundancy to enable data to withstand transmission through a noisy channel. The 
simplest classical error-correcting code is the threefold repetition code 0 — 000, 
1 > 111, which permits the encoded bit to be faithfully recovered after up to one 
transmission error in the three-bit codeword. Analogous error-correcting codes exist 
for quantum data, but they require more redundancy because they need to protect 
not only Boolean states, but also arbitrary superpositions of them [7, 8, 10]. Thus 
the simplest single-error-correcting quantum code (Fig. 13.6b) encodes an arbitrary 
input qubit |€) into an entangled state of five qubits, in such a way that if any one 
is corrupted enroute, the decoder can funnel the effects of the error into the four 
ancillary qubits, while restoring the first qubit to its original state. By the same 
token, a noisy quantum channel’s capacity Q for faithfully transmitting qubits (de- 
termined by the amount of redundancy quantum error-correcting codes require to 
achieve asymptotically perfect fidelity transmission in the limit of large blocksize), 
is generally less, and can never be greater, than its capacity C' for transmitting 
classical bits. The inequality C > Q holds for all channels because if a channel 
can faithfully transmit a general qubit, then it can certainly transmit the particular 
qubits |0) and |1). 


Besides C' and Q, quantum channels have a third capacity, Qe, for transmitting 
qubits faithfully with the help of two-way classical communication between sender 
and receiver. This classically-assisted quantum capacity will be discussed later, 
after we describe another uniquely quantum form of communication, quantum tele- 
portation. 
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Fig. 13.7. 


13.4 Applications of Quantum Entanglement 


Two forms of quantum information transmission having no classical counterpart, 
but closely related to each other, are quantum teleportation [11] (Fig. 13.7a) and 
quantum superdense coding [12] (Fig. 13.7b). These involve an initial stage in which 
an EPR pair is shared between two parties, followed by a second stage in which 
this shared entanglement is used to achieve, respectively, transmission of a qubit 
via two Classical bits, or transmission of two classical bits via one qubit. Quantum 
teleportation illustrates the fact that transmission of intact quantum states requires 
two qualitatively different resources, viz. a quantum resource that cannot be cloned, 
and a directed resource that cannot travel faster than light. In direct transmission 
of a qubit, these two functions are performed by the same particle. In teleportation 
the former function is provided by the shared EPR pair, the latter by the two 
classical bits. This situation may be summarized by saying that classical information 
theory involves one species of information, and one kind of noiseless communication 
primitive (transmission of a bit), whereas quantum information theory involves two 
species (classical information and entanglement), and three primitives (transmitting 
a bit, transmitting a qubit, and sharing an EPR pair) which are related through 
superdense coding and teleportation. 


Aside from its avoidance of a direct quantum channel from Alice to Bob, telepor- 
tation is noteworthy in that it is irreversible: one EPR pair’s worth of entanglement 
is consumed, and two random bits, which are utterly uncorrelated with the state 
being teleported, are generated and eventually must be discarded into the environ- 
ment, typically as waste heat, in obedience to Landauer’s principle. If a qubit is 
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teleported from Alice to Bob, and then back to Alice, there will be zero net effect 
on the quantum system, but two EPR pairs of entanglement will have been used up 
and four bits worth of waste heat generated. Such cyclic but irreversible processes 
can also be found in classical thermodynamics, for example when an ideal gas freely 
expands into an evacuated chamber, and then is isothermally compressed back to 
its original volume. 


Through teleportation, remote parties can use classical communication and 
shared EPR pairs to do any quantum operation they could have done had they 
been in the same location. For example, if Alice wants to bring one of her qubits 
into interaction with one of Bob’s qubits, she can teleport it him, have him do the in- 
teraction, then have him teleport the post-interaction qubit back to her. Conversely, 
one may ask if there is a single primitive interaction to which all communication 
can be reduced. It is evident that all forms of communication, classical or quan- 
tum, can be implemented if we have the ability to do local operations and non-local 
CNOTs between Alice’s and Bob’s data. In view of this, it is reasonable to take 
as our primitive of interaction any Hamiltonian which generates a two-qubit gate 
equivalent to an Alice-Bob CNOT when integrated over time, for example 


; (13.1) 


ooo & 


0 
0 
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which does so when allowed to act for a time 7/e. 
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Returning to entanglement, it is evident that it is a valuable resource comple- 
mentary in many respects to classical communication. For this reason, it would 
be good to have a way of measuring it, not only for maximally-entangled EPR 
states, but also for partly entangled pure states of a bipartite system such as 
cos6|00) + sin6|11) and for mixed states. For pure states, a good measure is 
the “entropy of entanglement,” defined as the von Neumann entropy of either 
the Alice or Bob subsystem taken alone. In the example given, this entropy is 
H2(cos” 6) = — cos? @log, cos? 6 — sin” @ log, sin? 9. Entropy of entanglement is a 
good entanglement measure for pure states because for any pure state W of a bi- 
partite system, it is asymptotically equal [13] both to the number of standard EPR 
states required to prepare one instance of W, and the number of standard EPR 
pairs that can be prepared from one instance of VY, using local operations and clas- 
sical communication. This fungibility of entanglement justifies the term ebit for the 
amount of entanglement in a maximally-entangled EPR state of two qubits, e.g. a 
singlet state (| t1) — | 11))/./2 of two spin-1/2 particles. 


For mixed states, the situation is more complicated [14], and it appears likely 
that for some mixed states, the entanglement of formation—the asymptotic number 
of pure ebits required to prepare an instance of the state by local operations and 
classical communication—may exceed the distillable entanglement—the asymptotic 
number of pure singlets that can be prepared from an instance of the state, using 
local operations and classical communication. For pure states both these measures 
reduce to the entropy of entanglement. 


The possibility of distilling pure entanglement from mixed states gives rise to 
the classically-assisted quantum capacity Q2 mentioned above. A noisy channel 
N, if it is not too noisy, can be used for reliable quantum communication by the 
following indirect procedure {8, 9]. 
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Transformation Resource Produced Resources Used 


Cloning lb) @ JO) => |b) @ |p) Impossible 

Classica] Copying |z) @|0) > |z) @|z) < 1Bit> 

EPR sharing 1 Ebit < 1 Qubit~ 

Use Qubit to send Bit 1 Bit < 1 Qubit~ 
Teleportation 1 Qubit~ < 1 Ebit + 2 Bits” 
Superdense coding 2 Bits~ < 1 Ebit + 1 Qubit7~ 
Quantum Source Coding 1 R-random Qubit~* <  R Qubits~ 

Quantum Channel Coding Q(N) noiseless Qubits* ~<_ 1 usé” of channel V 
Entang. Concentration E pure singlets <x 1 E-entangled pure stat 
Entanglement Dilution 1 E-entangled pure state < £ singlets +Bits~ 
Class. Assisted Q Commun. | Q2(NV) noiseless Qubits*’ << _ 1 use” of N + Bits * 
Quantum Key Distrib. Shared Secret Key Bit, <  Eavesdropped Qubits~ 


or Failure +Bits* 


Table 13.1. In the first line, the cloning of an unknown state w is impossible and cannot 
be achieved by use of any combination of resources. In the other lines, < signifies an exact 
reducibility, ie the resource on the left can be exactly produced by use of the resources on 
the right. < signifies an asymptotic reducibility, ie m instances of the resource on the left 
can be approximately produced by use of n instances of the resources on the right, with 
the ratio m/n and the fidelity of the approximation both approaching unity as n — oo. 
The superscript arrows (eg Bit~ ) indicate the direction of transmission for resources such 
as classical bits, qubits, or noisy channel uses; Bit*’ indicates that bidirectional classical 
communication is required. An R-random qubit is a qubit drawn from an ensemble of von 
Neumann entropy R. An B-entangled pure state is a pure bipartite state having entropy 
of entanglement E. 


1. Alice prepares a supply of EPR pairs and sends half of each pair to Bob 
through the noisy channel. This leaves Alice and Bob with a supply of shared 
but noisy EPR pairs, in other words, bipartite mixed states pap. 


2. If the channel NV was too noisy, the mixed states will have zero distillable 
entanglement and nothing more can be done. On the other hand, if the shared 
pairs have nonzero distillable entanglement, Alice and Bob, by performing 
local operations and measurements and sacrificing some of the pairs, can distill 
a smaller number of arbitrarily pure EPR pairs. This purification process (cf. 
Fig. 13.8) resembles fractional distillation or water desalination, and uses two- 
way classical communication between Alice and Bob to communicate their 
measurement results and decide which pairs to keep and which to discard. 


3. Alice then uses the good EPR pairs to teleport an unknown input state reliably 
to Bob, with the help of additional classical communication. 


The Q2 capacity of the noisy channel NV is defined as the asymptotic number of 
reliable qubits per channel use that can be communicated in this fashion, with the 
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assistance of unrestricted two-way classical communication. Some noisy channels, 
for example the 50 per cent depolarizing channel (which depolarizes half the pho- 
tons passing through), have positive Q» capacity even though their direct quantum 
capacity Q is zero. For other channels, the two quantum capacities are equal. It is 


not known whether there are channels for which Q» exceeds the classical capacity 
C. 


13.5 Quantum Cryptography 


Quantum cryptographic key distribution [9, 15-19, 22, 23] is a protocol involving 
both quantum and classical communication among three parties, the legitimate 
users Alice and Bob and an eavesdropper Eve. Alice and Bob’s goal is to use quan- 
tum uncertainty to do something that would be impossible by purely classical public 
communication—agree on a secret random bit string A, called a cryptographic key, 
that is informationally secure in the sense that Eve has little or no information on 
it!. In the quantum protocol (cf. Fig. 13.9), Eve is allowed to interact with the 
quantum information carriers (e.g. photons) enroute from Alice to Bob—at the 
risk of disturbing them—and can also passively listen to all classical communica- 
tion between Alice and Bob, but she cannot alter or suppress the classical messages. 
Sometimes (e.g. if the Eve jams or interacts strongly with the quantum signals) 
Alice and Bob will conclude that the quantum signals have been excessively dis- 
turbed, and therefore that no key can safely be agreed upon (designated by a frown 
in the figure); but, conditionally on Alice and Bob’s concluding that it is safe to 
agree on a key, Eve’s expected information on that key should be negligible. 


13.6 Summary 


Table 13.1 compares some transformations of classical and quantum information in 
terms of resources produced and consumed. 
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in a key that is only computationally secure—an adversary with sufficient computing power could 
infer it from the messages exchanged between Alice and Bob. In particular, the most widely used 
classical key agreement protocols could be easily broken by a quantum computer, if one were 
available. 
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14 
QUANTUM COMPUTATION 


Richard J. Hughes 


Abstract 


The remarkable developments in theoretical and experimental quantum compu- 
tation that have been inspired by Feynman’s seminal papers on the subject are 
reviewed. Following an introduction to quantum computation, the implications for 
cryptography of quantum factoring are discussed. The requirements and challenges 
for practical quantum computational hardware are illustrated with an overview of 
the ion trap quantum computation project at Los Alamos. The physical limitations 
to quantum computation with trapped ions are analyzed and an assessment of the 
computational potential of the technology is made. 


“ ..it seems that the laws of physics present no barrier to reducing the 
size of computers until bits are the size of atoms, and quantum behavior 
holds dominant sway.” R. P. Feynman, 1985 [1]. 


“T think I can safely say that nobody understands quantum mechanics.” 
R. P. Feynman, 1965 [2]. 


14.1 Introduction 


A naive extrapolation of computer technology suggests that within 20 years quan- 
tum phenomena will become relevant. This observation led Feynman [1] to inves- 
tigate how a computational device might be implemented with information repre- 
sented quantum mechanically. Specifically, he considered the representation of a 
single bit of information by an “atom” that is in one or the other of two possible 
states, denoted |0) and |1). (A single bit of this form is now known as a qubit.) 
An L-bit number can then be represented as a state of a “register” of L such two- 
state systems. Feynman’s motivation was to determine whether quantum physics, 
and the Uncertainty Principle specifically, imposed any limitations on computation. 
He was able to show that the reversible Boolean operations previously investigated 
in studies of the thermodynamics of computation (3, 4] could be implemented as 
unitary transformations on quantum systems. However, in his conclusion he noted 
that he had “...not really used many of the specific qualities of the differential 
equations of quantum mechanics,” and that what he did was “... only to imitate as 
closely as possible the digital machine of conventional sequential architecture.” But 
in another paper Feynman [5] hinted that quantum mechanics might offer greater 
computational power than conventional computers. 
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Other authors have taken up the leads suggested in Feynman’s seminal papers 
(see also Benioff [6]). Instead of merely attempting to achieve some improvement 
over conventional computational technologies based on irreversible Boolean logic, 
in terms of clock speed for instance, the modern concept of a quantum computer is 
a truly new computational paradigm. For certain problems that are “intractable” 
on conventional computers the peculiarly quantum phenomena of superposition, 
"entanglement” [7, 8]! and interference would allow a dramatic reduction in their 
computational complexity on a quantum computer”. In particular, quantum al- 
gorithms with compelling applications to cryptography have been invented. As a 
result, the field of quantum computation has seen tremendous growth, with realis- 
tic hardware proposals and related experiments; and the development of quantum 
error correcting codes. In this article I shall describe why quantum computation is 
now such an active field; why it is experimentally difficult; and the prospects for 
quantum computation with trapped ions. 


It was Deutsch (9, 10] who first suggested that by using non-Boolean unitary op- 
erations the quantum superposition principle could be exploited to achieve greater 
computational power than with conventional computation. Bernstein and Vazi- 
rani [11] showed that Deutsch’s general model of quantum computation was both 
efficient and “reasonable.” Their results were extended by Yao [12] who proved 
that quantum circuits (introduced in Reference 10) are polynomially equivalent to 
Deutsch’s quantum Turing machine of Reference 9. Early work by Deutsch and 
Josza [13] showed how to exploit computationally the power afforded by the su- 
perposition principle. But it was not until the work of Shor [14] in 1994 that 
this “quantum parallelism” was shown to offer an efficient solution of an interest- 
ing computational problem. Building on earlier work of Simon [15], Shor invented 
polynomial-time quantum algorithms for solving the integer factorization and dis- 
crete logarithm problems [14]. The difficulty of solving these two problems with 
conventional computers underlies the security of much of modern public key cryptog- 
raphy [16]. Shor’s algorithms are sufficiently compelling that the daunting scientific 
and technological challenges involved in practical quantum computation are now 
worthy of serious experimental study. Since 1994 other interesting quantum algo- 
rithms, including a new class typified by Grover’s “database search” algorithm [17], 
have been invented, and there is considerable research directed toward defining a 
general mathematical framework encompassing all quantum algorithms [18-20]. 


When Feynman wrote his papers on quantum computation there were no vi- 
able hardware schemes. Subsequently, various proposals requiring the invention of 
new technologies were made, and fundamental problems with them were identi- 


1“, would not call [quantum entanglement] one but rather the characteristic trait of quan- 
tum mechanics, the one that enforces its entire departure from classical lines of thought.” E. 
Schrodinger, 1935 (8]. 

2The clock speed of a quantum computer would probably not be particularly high, so for general 
problems, outside the special class amenable to efficient quantum algorithms, it would not offer 
any computational advantage. 
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fied [21, 22]. Since then there have been many relevant experimental developments 
in the foundations of quantum mechanics, and although experimental quantum 
computation is still in its infancy, there are now several very promising hardware 
concepts based on existing technologies that avoid most of these fundamental prob- 
lems. One of these new schemes, using the quantum states of laser-cooled ions in 
an electromagnetic trap [23], is particularly promising. In 1994 Cirac and Zoller 
showed that such systems have the necessary characteristics to perform quantum 
computation. The relevant coherence times can be adequately long; mechanisms for 
performing the quantum logic gate operations exist; and a high-probability readout 
method is possible. (For a detailed description see Reference 24.) Several groups, 
including our own [25], are now investigating quantum computation with trapped 
ions. A single logic operation using a trapped beryllium ion has been demon- 
strated [26]. However, even algorithmically simple computations will require the 
creation and controlled evolution of quantum states that are far more complex than 
have so far been achieved experimentally. It is therefore important to quantify the 
extent to which trapped ions could allow the quantum engineering of the complex 
states required for quantum computation [27]. 


Just as with classical computers, quantum computers will have to cope with 
errors. However, quantum errors are much more challenging than their classical 
counterparts because they destroy the quantum coherences from which quantum 
computation derives its power. The phenomenon of “decoherence” [28] arising from 
interactions between a quantum system and its “environment” is invoked to explain 
the absence of macroscopic objects in quantum mechanical superposition states [7]. 
In quantum computers the damaging effects of decoherence will have to be con- 
trolled [29]. During the past two years there have been spectacular developments 
in the theory of quantum error correcting codes that protect quantum information 
from decoherence by exploiting entanglement [30, 31). Furthermore, quantum error 
correction procedures can be implemented in a fault-tolerant fashion [32], hold- 
ing out the prospect of unlimited quantum computation with imperfect physical 
implementations, if certain precision thresholds can be attained [33}. 


The rest of this paper is organized as follows. In Section 14.2 we review the 
basic principles of quantum computation and indicate why it is more powerful com- 
putationally than conventional computation for certain problems. In Section 14.3 
we consider the cryptographic implications of quantum factoring. In Section 14.4 
we describe the Cirac-Zoller scheme for ion trap quantum computation, and Sec- 
tion 14.5 is devoted to a description of the different qubit schemes possible with 
trapped ions. Section 14.6 contains estimates of the intrinsic limits to quantum 
computation with the two classes of qubits. Finally in Section 14.7 we present some 
conclusions. 
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14.2 Basic concepts of quantum computation 


The essential idea of quantum computation is to represent binary numbers by two- 
level quantum systems, such as two energy levels of an electron in an atom or ion, 
or the two possible states of a spin-1/2 particle or photon [34]. We will use the 
notation |0), |1) to denote the two distinct quantum states of such a system. These 
states form an orthonormal basis for the Hilbert space of this qubit, with 


(0|0) = (1|1) =1 


(01) = (1|0) =0 a) 


In general, the quantum state of a qubit can be written as a linear combination 
with complex coefficients of these two states, in the form 


|W) = cos(6/2)|0) — ie** sin(9/2)|1) (14.2) 


whose time evolution is given by the Schrddinger equation 
ih |W) = H|V¥) (14.3) 


where ft is Planck’s constant, and the Hamiltonian, H, is a hermitian 2x2 matrix. 
Typically, H will be a sum of two terms, 


H =H +H, (14.4) 


where Ho is the “free” Hamiltonian describing the evolution of the isolated qubit, 
and #7 is an interaction term, such as the interaction with an external “drive” that 
effects transitions between qubit levels. The qubit basis states will be eigenstates 
of Ho, with eigenvalues that may be chosen as 


Hol) = (etn) cae 


It is common to work in the interaction representation in which the time depen- 
dence of the qubit’s state is determined by H; alone. When the external drive can 
be switched on and off for specified amounts of time, the time evolution will then 


be described by 
(in) (hy) 146) 


where U is a unitary 2x2 matrix of the form 


where € is some real number. 


C= exp(—i f dt, /f) (14.7) 
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With multiple qubits it becomes possible to represent larger numbers in binary 
notation. For example with an L-qubit quantum register, it is possible to represent 
numbers, a, between 0 and 2” — 1, as the state 


L-1 
la) = TJ la.) (14.8) 
1=0 
where 
L-1 
a= S a;2* (14.9) 
1=0 


and a; = 0,1 is the i-th bit of a. Just as with a single qubit, we can consider 
the time evolution of a quantum register as the action of a unitary operation. 
In particular, the logic operations required for conventional computation can be 
produced in this way. However, because the Schrodinger equation possesses time- 
reversal symmetry we must use reversible Boolean logic with quantum states. It is 
known that all conventional logic operations can be accomplished with the following 
three reversible logic operations [1]. 


The logical NOT operation, 


nor: () dno(®)=(* 9) (9) = (2) 


which we may also write as, 


Il 
aN 


NOT: |a) > ja), a@=0,1 (14.11) 
is readily verified to be reversible. 


A reversible two-qubit operation is the controlled-NOT gate, in which the state 
of one qubit (the “target” ) is flipped if the other qubit (the “control” ) has the value 
“1” but unchanged if it has the value “0” 


CNOT,,t : |a)c|b)e + |a)cla® b):, a,b=0,1 (14.12) 


where the subscripts c, t denote “control” and “target” respectively, and “@” de- 
notes the logical exclusive-OR (“XOR”) operation or addition mod 2. Note that a 
second application of this operation returns the state of the qubits to their start- 
ing state. Also, note that this operation gives a reversible implementation of the 
XOR operation, at the expense of carrying forward some additional information 
(the control bit) that allows for reversibility. 


A third reversible logical operation involves three qubits and is known as the 
controlled-controlled-NOT (or Toffoli) operation: 


CCNOT 1 2, : \a)e1 |b) -2|c)e — la) er \b)-2|(a A b) ®@ ct, a, b, c= 0, 1 (14.13) 
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where “A” is the logical AND operation. 


From these three operations it is now straightforward to produce elementary 
arithmetic operations, such as a simple adder, using a sequence of unitary opera- 
tions: 


ADD A \a)1|b)2|0)3 —= CNOT;, 2(CCNOT, »2,3\a)1 |b) 2|0)3) = la); la & bola A b)3 
(14.14) 


which produces the sum bit on the second qubit and the carry bit on the third 
qubit. More complex combinations of the above primitives can be used to produce 
arbitrary logic operations. But note that input data must typically be carried 
forward to the output to allow for reversibility. Feynman showed that in general 
the amount of extra information that must be carried forward is just the input 
itself. So, to evaluate a function, F’, reversibly we will need an input register with 
the appropriate number of qubits to hold the function’s argument, a, and an output 
register of enough qubits, initially all in the |0) state, to hold the value, F(a): 


|a)0) — |a)|F'(a)) (14.15) 


The input register value can be prepared from the |0) state by applying NOT 
operations to the appropriate bits and then the function value can be produced 
in the output register using combinations of the above logical operations. The 
readout of the result of the quantum computation is performed through quantum 
measurements, which are represented as (non-unitary) projections in Hilbert space. 
For a single qubit, the projection 


P = |0){0| (14.16) 
projects onto the |0) state, giving the outputs 


P\0) = |0) 14.17 
P\i) =0 on 
i.e. the |0) state passes the test, whereas the | 1) state fails. With bit-wise projections 
on a register a readout of the result of a computation can be obtained. 


Thus far, the discussion has only shown how quantum systems might reproduce 
the procedures of conventional computation, and has not revealed any particular 
advantage to quantum mechanical computation, i.e. after each clock cycle the 
quantum computer is in a basis state. However, the Schrodinger equation defines 
a continuous time-evolution of a quantum state, so that we are not restricted to 
the discrete Boolean operations introduced above. In general, the quantum state 
of a qubit can be written as a coherent superposition state in which it has, in a 
sense, both values at once (cf. Eq.14.2.) On applying a readout measurement, 
P, to a qubit in a state such as Eq.14.2, quantum mechanics only allows us to 
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predict that we will obtain the result, |0), with probability cos?(@/2). (Also, note 
that the effect of the measurement is to project the qubit’s state onto either |0) or 
|1): A phenomenon known as “collapse of the wavefunction.”) Such superposition 
states can be created from the computational basis states using non-Boolean unitary 
operations such as 


_ |0) + cos(@/2)|0) — iexp(id)sin(6/2)|1) 
V(9,9)* 11) _, cos(6/2)}1) — iexp(—id)sin(8/2)|0) a8) 
For example, the unitary operation V (7/2, 7/2) creates the equally-weighted super- 
positions 


\0) + 2-*/?(\0) + |1)) 


V (1/2, 7/2) : |1) + 27-1/2(_|0) + |1)) 


(14.19) 


from the basis states, |0) and |1). Note, however that these superposition states are 
not simple statistical mixtures of |0) and |1), because a second application of the 
same unitary operation produces the result 


)  |1) 


|0 
V? (1/2, 2/2) : 11). —I0) 


(14.20) 
illustrating the phenomenon of interference of quantum amplitudes. Other non- 
classical states can be created from the basis states when the V (7/2, 7/2) operation 
is combined with Boolean operations. For example, the two-gate sequence 


CNOT); « Vi(m/2,m/2)|0)s|0); = 27'/ (\O)s|0)5 + |1)s11)5) (14.21) 


(where V; acts on the i-th qubit) creates an “entangled” (non-factorizable) state, 
in which both qubits have the same value, but neither qubit has a definite value. 
(Multiparticle entangled states occur throughout quantum computation.) Clearly 
the operational methods for creating such non-classical states are of interest to the 
study of quantum mechanics, but they are also at the heart of the potential power 
of quantum computation to solve computational problems that are intractable on 
conventional computers. 


Applying the V(7/2,7/2) operation to each of the bits in an L-bit register 
produces the superposition state 


Dhue | 


\0)r~1|0)z-2...|0)o 4+ —= yal Theo + |1),) rb I) (14.22) 


So using only L quantum operations, a superposition of 2” states has been produced. 
Of course, a measurement of this state would produce only one of the 24 possible 
values, a, with equal probability, showing that the quantum information in this 
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state is not all accessible at once. But this type of state is the starting point of a 
chain of argument indicative of the potential power of quantum computation. 


Consider now our earlier example of quantum function evaluation, Eq. 14.15. 
We could replace the input by the above equally-weighted superposition state, and 
then in one operation of the quantum circuit for the function F we would produce 
all 24 outputs in superposition: 


a 5 Ia) |a)|0) > sa 3 |a)|F(a)) (14.23) 


*=0 7-0 


Still, this has only the appearance of exponential work in one operation because, 
as above, only one function value can be obtained by a measurement of the final 
state. However, if we were interested in some common property shared by the 
function’s values, such as the function’s period, this can be determined efficiently if 
we now bring into play the notion of interference. Shor showed [14] how to extract 
the period of a sequence, represented as a superposition of states of an L-qubit 
quantum register, using the quantum Fourier transform (QFT) 


ja) 3 274? > exp (i=) |c) (14.24) 
0 


(Note that this transformation is reversible.) Shor showed how to construct the 
QFT using only O(L*) non-Boolean quantum operations, in contrast to the O(L2") 
operations required for a conventional discrete Fourier transform. The QFT can be 
constructed using a single-qubit unitary (Hadamard) operation, H;, which acts on 
the j-th qubit to give: 
_ |0)3 4 271? ((0); + 11)3) 
Mi Wy, 2-4 (\0), — [D)3) aan) 


and a two-qubit operation, A;,, which acts on the j-th and k-th qubits to give: 


pin tn 

Lee be 

Ask Wn) 5)0)4 > I1)310)e en 
L)jl)e > explén/2*-4)/1)5HN)e 


(Note the different phases in the H operation and the V operation.) The trans- 
formation 14.24 can then performed as the sequence of operations (from left to 
right): 


H,-1A,-2,1—-14,—2AL—3,p-1AL-3,L—2H 1-3 --. H1Ao,L-1A0,L-2 -- eee eo 
14.27 
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producing the state 


qh} 
2-12 S~ exp (<>) |b) (14.28) 


c=0 


where |b) is the bit-reversed state of |c). It is therefore necessary to interpret the 
bits in reverse order after the transformation in order to obtain the QFT. To see 
this, note that the phase accumulated in the QFT of |a) to |c) is 


T 
> Rajyey + » Rag U4 (14.29) 
0<j<l O0<j<k<i 
which can be rewritten as 
jok 
ss on aby (14.30) 
O<j+k<l 


where 6b, = cj-~~-1, producing the desired result because adding multiples of 27 
does not affect the phase. 


On four qubits, for example, the QFT is effected by the sequence of operations 
(applied from left to right): 


H3 A2,3 He Ai,3.A1,2H1 Ao3A0,240,1 Ho (14.31) 


To see how the QFT can be used, we now consider two sequences, each of period 2, 
represented as superpositions of the states of four qubits, 


1 7 
We) = 5 > [2a) (14.32) 
q=0 
and, 
1 7 
Mo) = 5 S_ l2q + 1) (14.33) 
q=0 


Applying the QFT 14.24 produces the states: 

|W.) + 271/2(|0) + |8)) (14.34) 
and 

Wo) -+ 2-1/(|0) — 8) (14.35) 


In each case a measurement of the final state produces a result of n.16/r, where r 
= 2 is the period of the sequence, and n = 0, 1. So that for each state there is a 
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50% probability that the period, r, whichis a common property, can be determined. 
(Note that this illustrates the probabilistic nature of quantum algorithms. In the 
cases when the outcome is 0 the computation would have to be repeated.) Despite 
the shift between the above two sequences, the QFT allows their period to be 
extracted. This “shift-invariance” of the QFT is important to its usefulness in 
quantum algorithms. For example, if we consider the function 


0, 2 even -_ 
F(i) = { tg odd 8 = OL. 18 (14.36) 


using a four-qubit “left” register to hold the arguments of F' and a single-qubit 
“right” register to hold the values of F’, the state 


15 7 
5 li tlP@)a = 27? Se) 10) + [o) 21) (14.37) 
i=0 q=0 


can be produced with appropriate logic operations. Then after applying the QFT 
to the left register the resulting state is 


5 ((l0)z + 18)z)10)n + (l0)x ~ 18)z)]2) a (14.38) 


from which the period of F' can be extracted with 50% probability on measurement 
of the left register, independently of the values of F’. Generalizations of this method 
for determining the period of a function are at the heart of the quantum factoring 
and quantum discrete log algorithms. 


Here then is how non-Boolean quantum operations can be used to efficiently 
solve certain problems on a quantum computer: Information can be created and 
processed efficiently as superpositions of quantum amplitudes to reveal common 
features through the phenomenon of multi-particle quantum interference. By op- 
erating on only L qubits, it is (in a sense) possible to compute with 2” quantities 
in parallel. To obtain the computational advantages of quantum parallelism it is 
essential that no measurements are made before the computation is complete, oth- 
erwise the collapse of the register’s wavefunction will destroy the large amount of 
quantum information whose interference is required for the solution of the problem. 
Clearly we can avoid making any intentional measurements until the calculation is 
complete, but it is much more difficult to control decoherence, which is the main 
challenge for practical quantum computation. 


All known quantum algorithms for solving interesting problems use either the 
QFT or one of its variants such as the quantum Hadamard transformation, to 
exploit multi-particle quantum interference [35]. Shor’s algorithms and Simon’s 
algorithm [15] have inspired a mathematical generalization of this type of quantum 
algorithm, which may lead to the development of efficient quantum algorithms for 
solving other problems. In each case, the problem can be phrased as finding a 


QUANTUM COMPUTATION 201 


hidden subgroup, H, of an Abelian group, G, given a function on G that takes on 
different but constant values on the cosets of H. By constructing efficient quantum 
Fourier transforms over (Abelian) groups, this general class of problem is amenable 
to solution by quantum computation [20]. An active area of current research is the 
generalization to non-Abelian groups, such as the symmetric group. 


A mathematically distinct class of quantum algorithms are those based on 
Grover’s “database search” algorithm. In this problem one is presented with a 
table of N elements and asked to find the one element that is “marked” (satisfies 
some property). Classically, this can be solved by making O(V) queries of the table. 
Grover’s quantum algorithm solves this problem with only O(N}/?) quantum oper- 
ations. One version of this algorithm [36] is provably optimal [37]. Note that this 
class of quantum algorithm does not achieve the exponential speed-up of factoring, 
but the “square-root” improvement is nevertheless very significant. However, the 
most celebrated quantum algorithm is Shor’s quantum factoring algorithm, which 
we turn to next. 


14.3 Quantum Computation and Public Key Cryptography 


“The problem of distinguishing prime numbers from composite numbers 
and of resolving the latter into their prime factors is known to be one of 
the most important and useful in arithmetic.” K. F. Gauss, 1801 [38]. 


Every integer can be uniquely decomposed into a product of prime numbers. 
Most integers are easy to factor because they are products of small primes, but 
large integers (hundreds of digits in length) that are products of two, distinct, 
comparably-sized primes can be very difficult to factor with conventional comput- 
ers [39]. For example, in 1994 the 129-digit number known as RSA129 [40] required 
5,000 MIPS-years of computer time over an 8-month period, using more than 1,000 
workstations, to determine its 64-digit and 65-digit prime factors [41]. (By conven- 
tion one MIPS-year is about 3x10! instructions. Current workstations are rated at 
200-800 MIPS.) The perceived difficulty of factoring with conventional computers 
underlies the security of widely-used public key cryptosystems. But a quantum 
computer (QC) using Shor’s algorithm at a clock speed of 100 MHz would have 
factored RSA129 in only a few seconds. It is often necessary to ensure that en- 
crypted information remains secure for decades, but when encrypted information is 
transmitted we must assume that it can be monitored and saved for future anal- 
ysis by eavesdroppers. If information must be secure for X years, a cryptosystem 
must no longer be used X years before it is projected to become vulnerable. So the 
possibility that quantum computers could become feasible is not just a potential 
challenge to the use of public key cryptography in the future, but is a concern for 
the use of these cryptosystems today [42]. 


The RSA cryptosystem [43], is based on the following computationally difficult 
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problem: 


RSA problem: Given an integer N that is a product of two distinct primes, 
p and q, an integer e such that g.c.d.(e, (p — 1)(q — 1)) = 1, and an integer C, find 
the integer M such that, C = M® mod N. (Here “g.c.d” denotes “greatest common 
divisor,” and “mod N” indicates that arithmetic is being performed modulo N. 
Solving this problem is conjectured to be equivalent to factoring.) 


To understand the significance of the RSA problem we first introduce Euler’s 
quotient function, which for an integer m is defined as, 


(m) = number of integers less than m that are relatively prime tom. (14.39) 


Thus, for a prime, p, ¢(p) = (p—1), and for composite moduli of the form N = p.gq, 
introduced above we have, 


¢(N) = (p— 1)(q-1) (14.40) 


We will also need the following theorem of Euler, which states that for any integer 
z, relatively prime to m, i.e. g.c.d.(z,m) = 1, 


r9(™) — 1 mod m (14.41) 
Therefore, we can solve the RSA problem if we can find the integer d defined by 
d = eX(#(N))-! mod o(N) (14.42) 
because then 
cot — met = M*9(N)+! = M mod N (14.43) 


by Euler’s theorem, Eq. 14.41, for some integer k [44]. Clearly, if we know ¢(V) we 
can find the integer d, and we can determine ¢(JV) if we know the factors, p and q, 
of N. Thus the RSA problem can be solved if we can factor the modulus N, which 
is computationally hard. 


In the RSA cryptosystem the above problem is used to provide cryptographic 
security as follows. Alice wishes to send a (plaintext) message M (a large integer) 
to Bob, but wants to be sure that the eavesdropper Eve cannot read the message. 
So: 


1. Bob generates two large, distinct (secret) primes, p and q; 


2. He computes their product, N, and the integer ¢(V); 


3. Bob selects an integer e, such that g.c.d.(e, ¢(V)) = 1, and computes the 
integer d as above; 
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4. Bob publishes his public key, comprised of the modulus N and encrypting 
exponent, e, but keeps his private key (decrypting exponent) d (as well as p 
and q) secret; 


5. Alice uses Bob’s public key to compute the ciphertext C = M® mod N , and 
sends C to Bob; 


6. Bob recovers Alice’s message, M, using his secret decrypting exponent, d, as 
above. 


With an @ bit modulus, N, Alice and Bob can encrypt and decrypt their messages 
using only O(£?) operations, but if Eve wants to decrypt Alice’s communication she 
is faced with the computationally hard problem of factoring Bob’s modulus, N. 
One way for Eve to factor N would be to perform trial divisions by all primes less 
than N!/2. However, this would require O(exp[é]) divisions, and so the amount of 
computational work required would grow exponentially with the size of the modulus. 


Modern factoring algorithms (including Shor’s quantum factoring algorithm) 
use a different strategy [45]: They search for non-trivial solutions, y, of Legendre’s 
congruence, 


y? =1mod N (14.44) 
from which we have, 
(y+1)(y—1) =0 mod N (14.45) 


Then the factors of N will be distributed between the two parentheses, and can be 
found using 


g.c.d.(y + 1, N) = factor of N (14.46) 


The General Number Field Sieve (GNFS) algorithm [46] is the best algorithm is use 
today for factoring large integers. It is much more efficient than trial division, with 
a sub-exponential run-time growth, Ofexp(1.923¢'/3(in(¢))?/*)], and is well-adapted 
to distributed processing. This algorithm was recently used to factor the 130-digit 
number known as RSA130 [47] in 500 MIPS-years of computer time. 


To factor an @-bit integer, N, Shor’s quantum factoring algorithm requires a 
classical integer, z, that is relatively prime to N. This integer is obtained by 
classical computation, and then the function [48] 


f(a)=z° mod N, a=0,1,...N?-1 (14.47) 
is computed quantum mechanically. This function is periodic, with period, r, 


f(a+r) = f(a) (14.48) 
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where, r, (known as the order of z) is the smallest integer, r, for which 
x’ =1modN (14.49) 


After applying the QFT and measuring the result as illustrated in Section 14.2, 
there is some probability, determined by both number theory and quantum physics, 
that the order of z can be found. If the order, r, is even, the congruence 


(x"/? — 1)(x"/? + 1) =0 mod N (14.50) 


can be used to factor N using conventional computation, as described above. 


Shor’s algorithm therefore requires one 2é-bit register to hold the argument of 
the function, f ; an @-bit register to hold the function values, and some additional 
memory to allow reversible computation of the function. The amount of additional 
memory and number of quantum logic gates is somewhat dependent on the specific 
implementation of the algorithm [49], but in our recent improved version [50] a QC 
would need L qubits of memory and n, quantum logic operations, with 


L=52+4 


ng = 2502 + O(2) aeen 


to factor an @-bit modulus. The number of logic operations is dominated by the 
computation of Shor’s function, f. In contrast to the (sub)exponential run-time 
growth of the classical GNFS factoring algorithm, the quantum algorithm has a 
dramatically slower, polynomial, O(¢°), growth. (The 23-dependence can be under- 
stood as arising from the (conditional) multiplication of 22 classical @-bit integers 
to build the function, f. Each of the multiplications requires O(£?) bit-additions, 
using “elementary school” multiplication, that can be reduced to elementary quan- 
tum logic operations.) If we assume a nominal clock speed of 100 MHz for a QC we 
find that a 512-bit integer could be factored in about 30 seconds. Furthermore, the 
quantum factoring algorithm used for this estimate is not optimized and significant 
improvements are possible. 


Of course, improvements in conventional computers are also anticipated, but 
because of the (sub-)exponential run-time growth of conventional factoring algo- 
rithms the security of public key cryptosystems against conventional attacks can be 
maintained for many years by using integers with only 1,024 or 2,048 bits. In con- 
trast, because of the slow polynomial growth of the quantum factoring algorithm’s 
run-time, it would not be possible to easily ensure security against possible future 
quantum attacks by only making such modest increases in the size of the modulus. 
Integers with exponentially larger numbers of bits would be required for security, 
and integers of such a size would render the encryption and decryption procedures 
prohibitively long. 


From this simple analysis we see that because of the very large numbers of qubits 
and the long coherence times required we cannot yet state that quantum factoring 
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Fig. 14.1. Linear radio frequency quadrupole ion trap for quantum computation. 


of cryptographically significant integers will become possible. But the possibility 
that quantum factoring might become feasible in 20 years time (say) should be a 
serious concern for public-key cryptography today. 


Shor also invented a polynomial-time quantum algorithm for solving the discrete 
logarithm problem (DLP) [14]. 


Discrete logarithm problem: Given a prime number, p, an integer g < p and 
another integer y < p, find the integer zx, such that g” = y mod p. 


This problem, which like factoring is computationally intractable, is also widely 
used in public key cryptography. 


14.4 Quantum Computational Hardware 


From the foregoing we conclude that there are three essential requirements for quan- 
tum computation hardware. Firstly, it must be possible to prepare multiple qubits, 
adequately isolated from interactions with their environment for the duration of 
computation, in an addressable form. Secondly, there must be an external drive 
mechanism for performing the requisite quantum logic operations, with the requi- 
site careful and precise control of the qubits’ phases. And thirdly, there must be 
a readout mechanism for measuring the state of each qubit at the end of the com- 
putation. It is clear that it is much easier to write down a sequence of quantum 
logic operations than it is to perform them in the laboratory. Nevertheless, the 
above conditions can be satisfied with laser-cooled trapped ions [23], nuclear mag- 
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Fig. 14.2. Relevant transitions in calcium ions, showing wavelengths and lifetimes of 
metastable levels. 


netic resonance [51] and cavity quantum electrodynamics [52]. Other technologies 
that may be suitable for quantum computation include quantum dots and super- 
conducting circuits. We will illustrate the experimental challenges and prospects 
for practical quantum computation with the example of the trapped ion experiment 
that is being developed at Los Alamos [25]. In an ion trap quantum computer [23] 
a qubit would comprise two long-lived internal states, which we shall denote |0) 
and |1), of an ion isolated from the environment by the electromagnetic fields of 
a linear radio-frequency quadrupole (RFQ) ion trap. Many different ion species 
are suitable for quantum computation, and several different qubit schemes are pos- 
sible, as we shall see below. For example, at Los Alamos we are developing an 
ion-trap quantum-computer experiment using calcium ions, with the ultimate ob- 
jective of performing multiple gate operations on a register of several qubits (and 
possibly small computations) in order to determine the potential and physical lim- 
itations of this technology [25]. We have chosen calcium ions for the convenience of 
the wavelengths required. The heart of our experiment is a linear radio-frequency 
quadrupole (RFQ) ion trap with cylindrical geometry in which strong radial confine- 
ment is provided by radio-frequency potentials applied to four “rod” electrodes and 
axial confinement is produced by a harmonic electrostatic potential applied by two 
“end caps.” Our ion trap is about 1 cm long and 1.7 mm wide. (See Figure 14.1.) 


After Doppler cooling on their 397-nm S—P transition, calcium ions will become 
localized along the ion trap’s axis [53] because their recoil energy (from photon 
emission) is less than the spacing of the ions’ quantum vibrational energy levels in 
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Fig. 14.3. Strings of calcium ions laser-cooled to rest in the Los Alamos quantum com- 
putation ion trap. The image is formed by ion fluorescence in the cooling laser beam at 
397nm. 


the axial confining potential [54]. (See Figures 14.2 and 14.3.) Although localized 
to distances much smaller than the wavelength of the cooling radiation, the ions 
nevertheless undergo small amplitude oscillations. Their lowest frequency mode is 
the axial center of mass (CM) motion in which all the ions oscillate in phase along 
the trap axis. The frequency of this mode, whose quantum states will provide a 
computational “bus,” is set by the axial potential. The inter-ion spacing is deter- 
mined by the equilibrium between this axial potential, which tends to push the ions 
together, and the ions’ mutual Coulomb repulsion. 


For example, with a 200-kHz axial CM frequency, the inter-ion spacing is on the 
order of 30 wm. After this first stage of cooling, the ions form a “quantum register” 
in which one qubit can be addressed (with a suitable laser beam) in isolation from its 
neighbors. We have determined that more than 20 ions can be held in an optically 
addressable configuration. However, before quantum computation can take place, 
the quantum state of the ions’ CM mode must be prepared in its quantum ground 
state. 


Because of the long radiative lifetime of the metastable 3D-states (~ 1s), the 
S-D electric quadrupole transition in calcium ions has such a narrow width that 
it displays upper and lower sidebands separated from the central frequency by the 
CM frequency. With a laser that has a suitably narrow linewidth, tuned to the 
lower sideband, an additional stage of laser cooling (beyond Doppler cooling) can 
be used to prepare the “bus” qubit (CM vibrational mode) in its lowest quantum 
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Fig. 14.4. Internal and motional quantum states of a trapped calcium ion. 


state (“sideband cooling”) (55]. On completion of this stage, the QC would have 
all qubits in the |0) state, ready for quantum computation. (This second stage of 
cooling could also be performed with Raman transitions.) 


The quantum state of the register of ions will then be manipulated by performing 
quantum logical gate operations that will be effected by directing a laser beam 
at individual ions for prescribed times. The laser-ion interaction will coherently 
change the state of the qubit through the phenomenon of Rabi oscillations. (Several 
different types of transition are possible.) As we will see below, the CNOT operation 
can be effected with the help of the quantum states of the ions’ CM motion to convey 
quantum information from one ion to the other. 


On completion of the quantum logic operations the result of the quantum com- 
putation can be read out using the phenomenon of quantum jumps (56], by turning 
on a laser that drives the transition between the |0) state and another ionic level 
that decays rapidly back to |0). An ion in the |0) state will then fluoresce, whereas 
an ion in the |1) state will remain dark. So, by observing which ions fluoresce and 
which are dark, a bit value can be obtained. 


14.5 Trapped ion qubits 


In addition to the two states, |0) and |1) comprising each ionic qubit, in an ion 
trap QC there is also a computational “bus” qubit formed by the ground, |g), and 
first excited state, |e), of the ions’ CM axial vibrational motion, which is used to 
perform logic operations between qubits (see Figure 14.4). 


By virtue of energy conservation (and possibly other selection rules) it is pos- 
sible to perform two types of coherent operations on a qubit, using laser pulses 
directed at an ion: On-resonance transitions that change only an ion’s internal 
state (“V” pulses); and red-sideband transitions (detuned from resonance by the 
CM frequency) that change both the qubit’s internal state and the CM quantum 
state (“U” pulses). (See Figure 14.5.) The V-pulse Hamiltonian for a particular 
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Fig. 14.5. Quantum computational transitions with trapped ions. 


ion is, 
AQ —ig io 
Hy => [e~*9|1)(0| + e*|0) (1]] (14.52) 
and the U-pulse Hamiltonian is, 
Fin — _i¢ ip t 
Hy = —= |e *®|1)(0la + e’? (0) (1a 14.53 


Here {2 is the Rabi frequency (proportional to the square root of the laser intensity, 
I), ¢ is the phase of laser drive, 7 is the Lamb-Dicke parameter (characterizing the 
strength of the interaction between the laser and the ions’ oscillations), LD is the 
number of ions, and a (a’) is the destruction (creation) operator for quanta of the 
CM motion, satisfying 


a\g)=0, atlg)=|e), (a,a']=1 (14.54) 


The unitary operations effected by applying these Hamiltonians to the m-th qubit 
for a duration given by a parameter 6 and phase ¢ are: 


— |0)m — cos(8/2)\0)m — ie sin(O/2)|1)m 


Ym (Os) 11) —¥ 05(8/2)11)m — se sim(6 /2)I0)m 


(14.55) 


and 


Um (6, ¢) : 10) mle) + cos(9/2)|0) mle) — te** sin(O/2)|1)m|g) 
|1)mlg) + cos(8/2)|1)mlg) — ie“ sin(9/2)|0)m|e) 
To perform logic operations on the qubits an additional red-detuned operation in- 


volving the transition from |0) to an auxiliary level, |auz), in each qubit is required. 
with Hamiltonian 


(14.56) 


hyn 


He? = JE [e~** |aus) (Ola + e**|0)(auz|a] (14.577 
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with associated unitary operation U2"*(¢). For example, the controlled-sign-flip 
(CS F) operation between two qubits, c and t 


pine 

c = c t 

CSFet* 11\"1q), > |1)< 10), (14.58) 
\1)o]1)¢ => —|1),]1), 


can be accomplished with the sequence of three U-pulses of appropriate duration: 
CSF. = U.(1, 0)U;"* (277, 0)Ue (77, 0) (14.59) 

From this operation a CNOT gate can be produced as 
CNOT. = Vi(1/2,7/2)CSFaVi (1/2, 1/2) (14.60) 


The speed of U- and V-pulse transitions is determined by the Rabi frequency, 2, 
which is proportional to the square root of the laser intensity. But the U-pulses 
are slower than the V-pulses because they must put the ions’ center of mass into 
motion, which is a slower process with more ions, and moreover the Lamb-Dicke 
parameter, 7, is less than one. Because of their slowness (smallness of the coupling) 
the U-operations are the rate-limiting quantities to quantum logic operations. It 
is therefore desirable to drive these transitions as quickly as possible. However, 
the laser intensity cannot be made arbitrarily large, in order to avoid driving a 
V-transition, for instance. In the following we shall only count the duration of the 
U-pulses to the computational time. 


There are two classes of candidates for the qubit levels, which we shall refer 
to as “metastable state” and “Raman” qubits, respectively. The first category 
occurs in ions such as Hgt, Sr+, Cat, Bat and Yb* with first excited states that 
are metastable, with lifetimes ranging from 0.1 s (Hg), 0.4s (Sr*+), 1s (Ca*) (see 
Figure 14.2); 1 min (Bat) and even 10 years (Yb*). A qubit is comprised of an 
ion’s electronic ground (S) state (|0)), and a sublevel (|1)) of the metastable excited 
state (a D-state in Hg, Ca or Ba; an F-state in Yb). The advantage of this scheme 
is that it requires only a single laser beam to drive the qubit transitions, which 
greatly simplifies the optics of ion addressing. However, the disadvantage of this 
scheme is that it requires optical frequency stability of the laser drive that effects 
coherent transitions between the qubit levels. 


Raman qubit schemes use hyperfine sublevels of an ion’s ground state, or even 
Zeeman sublevels in a small magnetic field for ions with zero nuclear spin, with 
transitions between the qubit levels driven by Raman transitions. The advantages 
of this type of scheme are that the qubit states can be much longer-lived than the 
metastable state qubits; only radio frequency stability is required (corresponding to 
the frequency difference between the sublevels); and there are many more possible 
choices of ion (Be*+, Cat, Bat and Mgt for example). Disadvantages are that 
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addressing of the qubits is more complex owing to the requirement for two laser 
beams; and the readout is more involved than with metastable state qubits. 


During quantum computation it is essential that a QC evolves through a se- 
quence of pure quantum states, prescribed by some quantum algorithm. In general 
there will be some time scale required for a particular computation, and other time 
scales characterizing the processes that lead to the loss of quantum coherence. By 
estimating these time scales we can determine if ion trap QCs have the necessary 
preconditions to allow quantum computation to be performed, and which systems 
are most favorable. Furthermore, certain decoherence mechanisms become more 
pronounced with larger numbers of qubits, and there are technological limits to the 
number of qubits that can be held and addressed. Therefore, there are also memory 
(space) limitations to quantum computation, as well as time limitations, and it will 
be important to determine how to optimize quantum algorithms to make best use 
of the available resources. In our experiment we have determined that more than 
20 ions can be held in a linear configuration and optically addressed with minimal 
cross-talk, using available technology [25]. 


The various decoherence mechanisms can be separated into two classes: Funda- 
mental or technical. The former are limitations imposed by laws of Nature, such as 
the spontaneous emission of a photon from a qubit level, or the breakdown of the 
two-level approximation if a qubit transition is driven with excessive laser power. 
The technical limits are those imposed by existing experimental techniques, such 
as the “heating” of the ions’ CM vibrational mode, or the phase stability of the 
laser driving the qubit transitions. One might expect that these limitations would 
become less restrictive as technology advances. 


14.6 Computational limits with trapped ion qubits 


We shall consider a quantum algorithm that requires L qubits (ions), and n laser 
pulses (we count only the slow, U-pulses), each of duration ¢ (a 7-pulse, 6 = 7, for 
definiteness). With metastable state qubits, spontaneous emission of just one pho- 
ton from one of the qubits’ |1) states will destroy the quantum coherence required 
to complete this computation, so we may set an upper limit on the computational 
time, nt, in terms of the spontaneous emission lifetime of this level, 7). The specific 
form of the bound depends on the “average” number of qubits that will occupy the 
\1) state during the computation: we choose this proportion to be 2/3; giving a 
bound: 


nt < 679/L (14.61) 


So we see that “more” computation can be performed if the logic gate time, ¢, can 
be reduced. The duration, t, of a 7-pulse (proportional to 27") is determined by 
the intensity, J, of the laser field: t ~ I~!/2. However, t cannot be made arbitrarily 
small. In an earlier paper we showed that ¢ cannot be smaller than the period of the 
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Fig. 14.6. Decoherence induced by breakdown of the two-level approximation. 


CM motion, and shorter periods require stronger axial potentials that push the ions 
closer together [27]. The shortest possible gate time then corresponds to a minimum 
ion spacing of one wavelength of the interrogating laser light. In this paper we will 
consider a different mechanism that gives comparable limits: The breakdown of the 
two level approximation in intense laser fields, first considered in Reference 57. 


In addition to the two states comprising each qubit, there are other, “extra- 
neous” ionic levels with higher energies than the |1) state that have rapid electric 
dipole transitions (lifetime 7.,) to the ground state, and so if some population is 
transferred to such states during computation their rapid decay will destroy quan- 
tum coherence (see Figure 14.6). Although the driving laser frequency is far off- 
resonance (detuning A) from the transition frequency between |0) and a higher 
lying extraneous level, in intense laser fields there will be some probability, Pez, of 
occupying this level, given by 
Ve 
SA? (14.62) 
where 12-, is the Rabi frequency for the transition from the ground state, |0), to 
the higher lying, extraneous level. 


Pez ~ 


Therefore, the probability of decoherence through this two-level breakdown is 
proportional to the laser intensity, J. By requiring that the probability of photon 
emission from a third level should be less than one during the computation, we 
obtain the following inequality 


. 
8A? Tex 


This inequality sets an upper bound on the laser intensity. From the two inequali- 
ties 14.61 and 14.63 we obtain the bound 


1/2 3/2 
nL <4 (=) (=) Ter A (14.64) 


us 


nt <l (14.63) 
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between an algorithmic quantity (left-hand side) and a physics parameter (right- 
hand side), where Xo is the wavelength of the |0) - |1) transition, and A¢z is the wave- 
length of the transition from the extraneous level to the |0) state. Using “typical” 
values of Tez ~ 1078s and A ~ 10!5Hz we see that the value of the right-hand-side 
of this inequality is ~ 7.10’, translating into enough time to perform a very large 
number (n ~ 10° — —10°) of logic operations on tens of qubits. (The Lamb-Dicke 
parameter, 7, for these ions will be ~ 0.01-0.1.) 


The inequality ( 14.64) suggests that longer wavelength qubit transitions allow 
more computation. Indeed, for specific ions we obtain the bounds: 


Hgt : nL <.3x 10 
Srt :nL <.7x 10" 
Cat :nL <n.1~x 10° 
Bat :nL <7.5 x 10° 


suggesting that Bat ions may offer greater computational potential than Hgt or 
Cat. However, with L ~ 60 qubits the bound (64) in Bat corresponds to a com- 
putational time 67)/L ~ 6 s, whereas technical sources of decoherence such as ion 
heating and laser phase stability are likely to limit the computation before this limit 
is reached. Therefore, Ba* ions are not likely to offer any significant computational 
advantage over Ca+ at present. 


When qubits are represented by Zeeman or hyperfine sublevels of an ion’s ground 
state, Raman transitions would be used to drive the computational operations, 
detuned by an amount A below some third level (lifetime 7,) (see Figure 14.7). The 
Rabi frequency for Raman transitions is proportional to the laser field intensity, 


Q~T/A (14.65) 
as is the decoherence process of spontaneous emission from the third level, 
P~I/M (14.66) 


Hence, the probability of a successful computational result is independent of how 
quickly the computation is performed (at least from the perspective of this decoher- 
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ence mechanism). Therefore, Raman transitions offer the possibility of completing a 
computation before technical decoherence mechanisms, such as ion heating, become 
significant. Using similar arguments as in the last section, we can derive the follow- 
ing inequality for quantum algorithm parameters in terms of the physics parameters 
for Raman qubits: 


nL? < 8nnA (14.67) 


The right-hand side of this inequality has a typical value ~ 7.5x10° which is ade- 
quate for a large number of gate operations (n ~ 10°) on tens of qubits. With equal 
numbers of qubits, the error probability per gate is lower for the Raman transitions 
than for metastable qubits. 


It is also possible to express the computational bounds on trapped ions in terms 
of typical atomic values of lifetimes, wavelengths etc [58]. However, the bounds 
obtained this way are considerably more pessimistic than the ones above because 
real ions have much longer lived metastable levels than is suggested by the atomic 
unit of electric quadrupole moment, for instance. Therefore, although indicative of 
the amount of computation possible, this approach does not provide an absolute 
upper bound on computational capacity in terms of fundamental constants [59]. 


To translate the above physics bounds on algorithmic quantities into limits on 
the size of integer that could be factored, it is necessary to determine the compu- 
tational space and time requirements of quantum factoring. Using the values [50] 


L=5+4 
n = 9603 + O(é?) 


(where n is the number of U pulses) in the decoherence bounds above, we obtain 
the (algorithm-dependent) factoring limits (7 = 0.01): 


Hgt :€<5dbits 
Srt :£<6bits 
Cat :€<6bits 
Bat :€<10bits 
Yb* :€<5bits 


with metastable qubits. (Larger values may be possible with Raman qubits provided 
a careful optimization of the parameters is made.) These limits correspond roughly 
to the size of computation at which the probability of success has fallen to 1/e. 
Larger integers could be factored but with a lower success probability. Certainly, 
these projections of the intrinsic factoring capacity of ion trap QCs are insignificant 
in comparison with the size of integers that are used in cryptography. Nevertheless, 
the 6-bit factoring limit with Ca* ions (for instance) represents ~ 20,000 U-pulses 
applied to 34 qubits, taking ~ 0.2 s, representing ample opportunity for studying 
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practical small-scale quantum computation. Also, the above limits do not take into 
account any possible gains from the use of quantum error correction. We note that 
the total computational time with metastable qubits is ~ 679/L, so that it might 
be possible to reduce the computation time by using an algorithm with additional 
qubits but less gate operations. 


These estimates show that decoherence will inevitably limit the size of quantum 
computation that can be performed with trapped ions. This raises the question of 
whether the errors caused by decoherence can be corrected. At first sight this does 
not appear to be promising because we know that we cannot perform measurements 
during computation without destroying the quantum information we are trying to 
protect. Also, we know that it is not possible to faithfully duplicate an arbitrary 
quantum state [60]. Nevertheless it is now known that quantum error correction is 
possible by encoding a single logical qubit as an entangled state of multiple physical 
qubits (30, 31]. Errors can be detected by performing quantum logical operations 
involving the physical qubits and various ancilla qubits, whose state is subsequently 
measured and the result used to apply corrective unitary operations. Furthermore, 
it is also known that quantum error correction itself can be performed in a fault 
tolerant fashion [32], and that if certain thresholds in the precision of gate operations 
can be attained, these techniques allow indefinite computation in principle [33]. To 
do full justice to these remarkable results would require an entire article, but we note 
that they are of the form of an existence proof and that it remains a challenge to 
determine how to best implement quantum error correction methods for the current 
hardware schemes. 


There are distinct contrasts between the computational bounds for ion trap 
quantum computation and the recently proposed NMR quantum computation model 
[61]. Ion trap qubit coherence is limited by spontaneous emission processes whereas 
NMR qubit decoherence is thermally dominated (kT >> hv). Ion trap quantum 
information is consequently more robust. Furthermore, gate times in an ion trap 
QC could be as short as 1 ps (set by achievable laser intensities, and two-level break- 
down), whereas NMR gate times will typically be ~ 0.1-1 s (set by the strength 
of spin-spin interactions and the need to avoid crosstalk with unintended qubits). 
Readout in an NMR QC is problematic, with an exponential reduction in magneti- 
zation signal with additional qubits, whereas ion trap QC readout is a robust process 
independent of the number of qubits involved. Moreover, an ion trap QC has the 
advantage that logic operations can be performed between arbitrary qubits in the 
register, whereas in NMR only nearest-neighbor operations are possible. Therefore, 
computation in an NMR QC would use much of the available coherence time in 
moving qubits around the register until they are adjacent to each other. We esti- 
mate that a realistic bound to the computation possible in an NMR QC is about 10 
qubits and 100 logic operations. Of these 100 operations many would be used in a 
typical computation to move separated qubits until they are adjacent. Nevertheless, 
NMR quantum computation is interesting in its own right, and especially because 
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it allows investigations of small quantum computations with existing hardware. 


14.7 Summary and Conclusions 


In this paper we have surveyed the remarkable developments in quantum computa- 
tion that have taken place since Feynman’s pioneering work in the field. For certain 
problems quantum entanglement and multiparticle interference can be exploited to 
allow much more efficient solutions by quantum parallelism than by conventional 
computation. We have discussed the cryptographic significance of quantum compu- 
tation, and we have seen that to factor a 512-bit integer (for example) would require 
a quantum computer in which the quantum state of a 2,564-qubit register could be 
controlled in its 2254-dimensional Hilbert space, while of the order of 10° quan- 
tum logic operations are performed. We have also seen that with trapped ions we 
can envision controlling the quantum state of ~ 50 qubits in their 2°°-dimensional 
Hilbert space for ~ 10° quantum logic operations. Current experiments are only 
now proceeding beyond a single logic operation on a pair of qubits [62]. Given the 
enormous disparity between the current state of the art of quantum computation 
experiments and the requirements for quantum factoring of interesting numbers, it 
would therefore be easy to dismiss quantum computation as irrelevant for cryptog- 
raphy. However, as we have seen in Section 14.3, it is the possibility that quantum 
computation might become possible in 20 or 30 years time (say) that must be se- 
riously considered today. Cryptography will therefore be a compelling motivation 
for the development of quantum computation research. But quantum computa- 
tion will also open up a wide variety of important fundamental experiments in the 
foundations of quantum mechanics because a quantum computer allows arbitrary 
quantum states to be created from a small set of primitive operations. 


We have surveyed the prospects for and limitations to quantum computation 
with trapped ions. It is apparent that with existing technology, adequate time 
scales and capacity to hold multiple qubits are available to explore quantum com- 
putation well beyond the current state-of-the-art. These intrinsic limits (without 
quantum error correction) only correspond to the factoring of small integers. How- 
ever, the numbers of qubits and logic operations involved are large enough that this 
technology is well worth pursuing. Ion traps will therefore be a potent method for 
exploring whether superpositions and entangled states of large numbers of qubits 
can be created. Investigations of the type studied here identify the relevant physics 
issues that must be addressed to achieve computational gains. In particular, we note 
that there has yet to be a demonstration that more than one ion can be sideband 
cooled to the vibrational ground state. Furthermore, the heating mechanisms for 
this vibrational mode are poorly understood [63, 64]. Studies of sideband cooling 
and reheating of multiple ions will therefore be crucial to the development of ion 
trap QCs. Once entangled states of three or more qubits can be constructed it will 
also be possible to determine whether multiparticle decoherence mechanisms are 
consistent with the model that we have used. 
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Weare probably now entering the “vacuum tube” era of quantum computation. 
We should expect that many of the technologies now being pursued for quantum 
computation will be superceded by even more promising ideas. As experiments on 
larger and larger quantum registers are developed we will have to face the fundamen- 
tal dichotomy between classical and quantum physics. Perhaps we will encounter 
some failure of conventional quantum mechanics (65, 66], but in my opinion it is 
more likely that experimental progress will be limited by the daunting technical 
challenges. In any event the future will be exciting for both quantum physics and 
computation. As in a number of other cases, Richard Feynman’s unique physical 
insights have opened up an entirely new field of research. 


Postscript 


As a postdoc at Caltech from 1980 to 1982 I was fortunate to have an office close 
to Feynman’s, and so J was privileged to experience his unique approach to physics 
at first hand. I discovered that even the topics he did not consider worthy of pub- 
lication in the physics literature were often more fascinating than many papers in 
print. Subsequently I derived immense pleasure from explaining one of his scientific 
“tricks,” known as “Feynman’s proof of Maxwell’s equations,” which remained un- 
published until the story was recounted by Dyson [67]. Feynman had shown Dyson 
that two of the four Maxwell’s equations could be “derived” starting only from the 
Heisenberg equations of motion of a non-relativistic quantum mechanical particle, 
subject to a force of a very general form. That so much could be derived from so 
little seemed miraculous, especially considering the relativistic nature of Maxwell’s 
equations. I simply had to get to the bottom of the problem [68]. I found that 
Feynman’s argument was essentially a rediscovery of some little-known consistency 
conditions on generalized forces that can be accommodated in classical Lagrangian 
mechanics, first discovered by Helmholtz in the 19th Century [69]. By understand- 
ing Feynman’s “trick” I found I had developed new insights into features of classical 
mechanics that we often take for granted. 
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Parallel Computation 


15 


COMPUTING MACHINES IN THE FUTURE 


Richard P. Feynman 
Nishina Memorial Lecture, August 9, 1985 


15.1 Introduction 


It’s a great pleasure and an honor to be here as a speaker in memorial for a scientist 
that I have respected and admired as much as Professor Nishina. To come to Japan 
and talk about computers is like giving a sermon to Buddha. But I have been 
thinking about computers and this is the only subject I could think of when invited 
to talk. 


The first thing I would like to say is what I am not going to talk about. I want 
to talk about the future computing machines. But the most important possible de- 
velopments in the future are things that I will not speak about. For example, there 
is a great deal of work to try to develop smarter machines, machines which have a 
better relationship with humans so that input and output can be made with less 
effort than the complex programming that’s necessary today. This often goes under 
the name of artificial intelligence but I don’t like that name. Perhaps the unintelli- 
gent machines can do even better than the intelligent ones. Another problem is the 
standardization of programming languages. There are too many languages today, 
and it would be a good idea to choose just one. (I hesitate to mention that in Japan, 
for what will happen will be that there will simply be more standard languages — 
you already have four ways of writing now and attempts to standardize anything 
here result apparently in more standards and not fewer!) Another interesting future 
problem that is worth working on but I will not talk about, is automatic debugging 
programs. Debugging means fixing errors in a program or in a machine and it is 
surprisingly difficult to debug programs as they get more complicated. Another 
direction of improvement is to make physical machines three dimensional instead 
of all on a surface of a chip. That can be done in stages instead of all at once — 
you can have several layers and then add many more layers as the time goes on. 
Another important device would be one that could automatically detect defective 
elements on a chip and then the chip automatically rewire itself so as to avoid the 
defective elements. At the present time when we try to make big chips there are 
often flaws or bad spots in the chips, and we throw the whole chip away. If we 
could make it so that we could use the part of the chip that was effective, it would 
be much more efficient. I mention these things to try to tell you that I am aware 
of what the real problems are for future machines. But what I want to talk about 
is simple, just some small technical, physically good things that can be done in 
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principle according to the physical laws. In other words, I would like to discuss the 
machinery and not the way we use the machines. 


I will talk about some technical possibilities for making machines. There will 
be three topics. One is parallel processing machines which is something of the very 
near future, almost present, that is being developed now. Further in the future is 
the question of the energy consumption of machines which seems at the moment 
to be a limitation, but really isn’t. Finally I will talk about the size. It is always 
better to make the machines smaller, and the question is how much smaller is it 
still possible, in principle, to make machines according to the laws of Nature. I will 
not discuss which and what of these things will actually appear in the future. That 
depends on economic problems and social problems and I am not going to try to 
guess at those. 


15.2 Parallel Computers 


The first topic concerns parallel computers. Almost all the present computers, con- 
ventional computers, work on a layout or an architecture invented by von Neumann, 
in which there is a very large memory that stores all the information, and one cen- 
tral location that does simple calculations. We take a number from this place in the 
memory and a number from that place in the memory, send the two to the central 
arithmetical unit to add them and then send the answer to some other place in 
the memory. There is, therefore, effectively one central processor which is working 
very very fast and very hard while the whole memory sits out there like a vast 
filing cabinet of cards which are very rarely used. It is obvious that if there were 
more processors working at the same time we ought to be able to do calculations 
faster. But the problem is that someone who might be using one processor may be 
using some information from the memory that another one needs, and it gets very 
confusing. For such reasons it has been said that it is very difficult to get many 
processors to work in parallel. 


Some steps in that direction have been taken in the larger conventional machines 
called “vector processors”. When sometimes you want to do exactly the same step 
on many different items you can perhaps do that at the same time. The hope is 
that regular programs can be written in the ordinary way, and then an interpreter 
program will discover automatically when it is useful to use this vector possibility. 
That idea is used in the Cray and in “supercomputers” in Japan. Another plan is 
to take what is effectively a large number of relatively simple (but not very simple) 
computers, and connect them all together in some pattern. Then they can all work 
on a part of the problem. Each one is really an independent computer, and they 
will transfer information to each other as one or another needs it. This kind of a 
scheme is realised in the Caltech Cosmic Cube, for example, and represents only 
one of many possibilities. Many people are now making such machines. Another 
plan is to distribute very large numbers of very simple central processors all over 
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the memory. Each one deals with just a small part of the memory and there is an 
elaborate system of interconnections between them. An example of such a machine 
is the Connection Machine made at MIT. It has 64,000 processors and a system 
of routing in which every 16 can talk to any other 16 and thus has 4000 routing 
connection possibilities. 


It would appear that scientific problems such as the propagation of waves in 
some material might be very easily handled by parallel processing. This is because 
what happens in any given part of space at any moment can be worked out locally 
and only the pressures and the stresses from the neighboring volumes need to be 
known. These can be worked out at the same time for each volume and these 
boundary conditions communicated across between the different volumes. That’s 
why this type of design works for such problems. It has turned out that a very 
large number of problems of all kinds can be dealt with in parallel. As long as the 
problem is big enough so that a lot of calculating has to be done, it turns out that 
a parallel computation can speed up time to solution enormously and this principle 
applies not just to scientific problems. 


What happened to the prejudice of two years ago, which was that the parallel 
programming is difficult? It turns out that what was difficult, and almost impos- 
sible, is to take an ordinary program and automatically figure out how to use the 
parallel computation effectively on that program. Instead, one must start all over 
again with the problem, appreciating that we have parallel possibility of calcula- 
tion, and rewrite the program completely with a new attitude to what is inside 
the machine. It is not possible to effectively use the old programs. They must be 
rewritten. That is a great disadvantage to most industrial applications and has met 
with considerable resistance. But the big programs usually belong to scientists or 
other, unofficial, intelligent programmers who love computer science and are willing 
to start all over again and rewrite the program if they can make it more efficient. 
So what’s going to happen is that the hard programs, vast big ones, will be the first 
to be re-programmed by experts in the new way, and then gradually everybody will 
have to come around, and more and more programs will be programmed that way, 
and programmers will just have to learn how to do it. 


15.3 Reducing the Energy Loss 


The second topic I want to talk about is energy loss in computers. The fact that 
they must be cooled is an apparent limitation for the largest computers — a good 
deal of effort is spent in cooling the machine. I would like to explain that this is 
simply a result of very poor engineering and is nothing fundamental at all. Inside 
the computer a bit of information is controlled by a wire which either has a voltage 
of one value or another value. It is called “one bit”, and we have to change the 
voltage of the wire from one value to the other and have to put charge on or take 
charge off. I make an analogy with water: We have to fill a vessel with water to 
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get one level or empty it to get to the other level. This is just an analogy — if you 
like electricity better you can think more accurately electrically. What we do now 
is analogous, in the water case, to filling the vessel by pouring water in from a top 
level (Fig. 15.1), and lowering the level by opening the valve at the bottom and 
letting it all run out. In both cases there is a loss of energy because of the sudden 
drop in level of the water, through a height from the top level where it comes in, to 
the low bottom level, and also when you start pouring water in to fill it up again. 
In the cases of voltage and charge, the same thing occurs. 


It’s like, as Mr. Bennett has explained, operating an automobile which has to 
start by turning on the engine and stop by putting on the brakes. By turning on 
the engine and then putting on the brakes, each time you lose power. Another way 
to arrange things for a car would be to connect the wheels to flywheels. Now when 
the car stops, the flywheel speeds up thus saving the energy — which can then be 
reconnected to start the car again. The water analog of this would be to have a 
U-shaped tube with a valve in the center at the bottom, connecting the two arms 
of the U (Fig. 15.2). We start with it full on the right but empty on the left with 
the valve closed. If we now open the valve, the water will slip over to the other 
side, and we can close the valve again, just in time to catch the water in the left 
arm. Now when we want to go the other way, we open the valve again and the 
water slips back to the other side and we catch it again. There is some loss and 
the water doesn’t climb as high as it did before, but all we have to do is to put a 
little water in to correct the loss — a much smaller energy loss than the direct fill 
method. This trick uses the inertia of the water and the analogue for electricity 
is inductance. However, it is very difficult with the silicon transistors that we use 
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today to make up inductance on the chips. So this technique is not particularly 
practical with present technology. 


Another way would be to fill the tank by a supply which stays only a little bit 
above the level of the water, lifting the water supply in time as we fill up the tank 
(Fig. 15.3), so that the dropping of water is always small during the entire effort. In 
the same way, we could use an outlet to lower the level in the tank, but just taking 
water off near the top and lowering the tube so that the heat loss would not appear 
at the position of the transistor, or would be small. The actual amount of loss will 
depend on how high the distance is between the supply and the surface as we fill it 
up. This method corresponds to changing the voltage supply with time. So if we 
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could use a time varying voltage supply, we could use this method. Of course, there 
is energy loss in the voltage supply, but that is all located in one place and there it 
is simple to make one big inductance. This scheme is called “hot clocking”, because 
the voltage supply operates at the same time as the clock which times everything. 
In addition we don’t need an extra clock signal to time the circuits as we do in 
conventional designs. 


Both of these last two devices use less energy if they go slower. If I try to move 
the water supply level too fast, the water in the tube doesn’t keep up with it and 
there ends being a big drop in water level. So to make the device work I must go 
slowly. Similarly, the U-tube scheme will not work unless that central valve can 
open and close faster than the time it takes for the water in the U-tube to slip 
back and forth. So my devices must be slower — I’ve saved an energy loss but I’ve 
made the devices slower. In fact the energy loss multiplied by the time it takes 
for the circuit to operate is constant. But nevertheless, this tums out to be very 
practical because the clock time is usually much larger than the circuit time for 
the transistors, and we can use that to decrease the energy. Also if we went, let 
us say, three times slower with our calculations, we could use one third the energy 
over three times the time, which is nine times less power that has to be dissipated. 
Maybe this is worth it. Maybe by redesigning using parallel computations or other 
devices, we can spend a little longer than we could do at maximum circuit speed, 
in order to make a larger machine that is practical and from which we could still 
reduce the energy loss. 


For a transistor, the energy loss multiplied by the time it takes to operate is a 
product of several factors (Fig. 15.4): 


1. the thermal energy proportional to temperature, kT; 


2. the length of the transistor between source and drain, divided by the velocity 
of the electrons inside (the thermal velocity ./3kT/m); 


3. the length of the transistor in units of the mean free path for collisions of 
electrons in the transistor; 


4. the total number of the electrons that are inside the transistor when it oper- 
ates. 


Putting in appropriate values for all of these numbers tells us that the energy 
used in transistors today is somewhere between a billion to ten billion or more times 
the thermal energy kT. When the transistor switches we use that much energy. This 
is very large amount of energy. It is obviously a good idea to decrease the size of the 
transistor. We decrease the length between source and drain and we can decrease 
the number of the electrons, and so use much less energy. It also turns out that a 
smaller transistor is much faster, because the electrons can cross it faster and make 
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their decisions to switch faster. For every reason, it is a good idea to make the 
transistor smaller, and everybody is always trying to do that. 


But suppose we come to a circumstance in which the mean free path is longer 
than the size of the transistor, then we discover that the transistor doesn’t work 
properly any more. It does not behave the way we expected. This reminds me, 
years ago there was something called the sound barrier. Airplanes were supposed 
not to be able to go faster than the speed of sound because, if you designed them 
normally and then tried to put the speed of sound in the equations, the propeller 
wouldn’t work and the wings don’t lift and nothing works correctly. Nevertheless, 
airplanes can go faster than the speed of sound. You just have to know what the 
right laws are under the right circumstances, and design the device with the correct 
laws. You cannot expect old designs to work in new circumstances. But new designs 
can work in new circumstances, and I assert that it is perfectly possible to make 
transistor systems, or more correctly, switching systems and computing devices in 
which the dimensions are smaller than the mean free path. I speak of course, ‘in 
principle’, and I am not speaking about the actual manufacture of such devices. 
Let us therefore discuss what happens if we try to make the devices as small as 
possible. 


15.4 Reducing the Size 


So my third topic is the size of computing elements and now I speak entirely theo- 
retically. The first thing that you would worry about when things get very small, is 
Brownian motion — everything is shaking about and nothing stays in place. How 
can you control the circuits then? Furthermore, if a circuit does work, doesn’t it 
now have a chance of accidentally jumping back? If we use two volts for the energy 
of this electric system, which is what we ordinarily use (Fig. 15.5) that is eighty 
times the thermal energy at room temperature (kT =1/40 volt) and the chance that 
something jumps backward against 80 times thermal energy is e, the base of the 
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natural logarithm, to the power minus eighty, or 10-47. What does that mean? If 
we had a billion transistors in a computer (which we don’t yet have), all of them 
switching 101° times a second (a switching time of a tenth of a nanosecond), switch- 
ing perpetually, operating for 10° seconds, which is 30 years, the total number of 
switching operations in such a machine is 1078. The chance of one of the transis- 
tors going backward is only 10~**, so there will be no error produced by thermal 
oscillations whatsoever in 30 years. If you don’t like that, use 2.5 volts and then 
the probability gets even smaller. Long before that, real failures will come when a 
cosmic ray accidentally goes through the transistor, and we don’t have to be more 
perfect than that. 


However, much more is in fact possible and I would like to refer you to an article 
in a most recent Scientific American by C. H. Bennett and R. Landauer [“The 
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Fundamental Physical Limits of Computation”, Sci. Am. July 1985; Japanese 
Transl.— SAIENSU, Sept. 1985]. It is possible to make a computer in which 
each element, each transistor, can go forward and accidentally reverse and still 
the computer will operate. All the operations in the computer can go forward or 
backward. The computation proceeds for a while one way and then it undoes itself, 
‘uncalculates’, and then goes forward again and so on. If we just pull it along a 
little, we can make this computer go through and finish the calculation by making 
it just a little bit more likely that it goes forward than backward. 


It is known that all possible computations can be made by putting together 
some simple elements like transistors; or, if we want to be more logically abstract, 
something called a NAND gate, for example (NAND means NOT-AND). A NAND 
gate has two “wires” in and one out (Fig. 15.6). Forget the NOT for the moment. 
What is an AND gate? An AND gate is a device whose output is 1 only if both 
input wires are 1, otherwise its output is 0. NOT-AND means the opposite, thus 
the output wire reads 1 (i.e. has the voltage level corresponding to 1) unless both 
input wires read 1; if both input wires read 1, then the output wire reads 0 (i.e. 
has the voltage level corresponding to 0). Figure 15.6 shows a little table of inputs 
and outputs for such a NAND gate. A and B are inputs and C is the output. If 
A and B are both 1, the output is 0, otherwise 1. But such a device is irreversible: 
Information is lost. If I only know the output, I cannot recover the input. The 
device can’t be expected to flip forward and then come back and compute correctly 
anymore. For instance, if we know that the output is now 1, we don’t know whether 
it came from A=0, B=1 or A=1, B=0 or A=0, B=0 and it cannot go back. Such a 
device is an irreversible gate. The great discovery of Bennett and, independently, of 
Fredkin, is that it is possible to do computation with a different kind of fundamental 
gate unit, namely, a reversible gate unit. I have illustrated their idea — with a unit 
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which I could call a reversible NAND gate. It has three inputs and three outputs 
(Fig. 15.7). Of the outputs, two, A’ and B’, are the same as two of the inputs, 
A and B, but the third input works this way. C’ is the same as C unless A and 
B are both 1 in which case it changes whatever C is. For instance, if C is 1 it is 
changed to 0, if C is 0 it is changed to 1 — but these changes only happen if both 
A and B are 1. If you put two of these gates in succession, you see that A and B 
will go through, and if C’ is not changed in both it stays the same. If C' is changed, 
it is changed twice so that it stays the same. So this gate can reverses itself and no 
information has been lost. It is possible to discover what went in if you know what 
came out. 


A device made entirely with such gates will make calculations if everything 
moves forward, but if things go back and forth for a while, but then eventually 
goes forward enough it still operates correctly. If the things flip back and then go 
forward later it is still all right. It’s very much the same as a particle in a gas which 
is bombarded by the atoms around it. Such a particle usually goes nowhere, but 
with just a little pull, a little prejudice that makes a chance to move one way a 
little higher than the other way, the thing will slowly drift forward and travel from 
one end to the other, in spite of the Brownian motion that is has made. So our 
computer will compute provided we apply a drift force to pull the thing across the 
calculation. Although it is not doing the calculation in a smooth way, nevertheless, 
calculating like this, forward and backward, it eventually finishes the job. As with 
the particle in the gas, if we pull it very slightly, we lose very little energy, but it 
takes a long time to get to one side from the other. If we are in a hurry, and we 
pull hard, then we lose a lot of energy. It is the same with this computer. If we 
are patient and go slowly, we can make the computer operate with practically no 
energy loss, even less than kT per step, any amount as small as you like if you have 
enough time. But if you are in a hurry, you must dissipate energy, and again it’s 
true that the energy lost to pull the calculation forward to complete it multiplied 
by the time you are allowed to make the calculation is a constant. 


With these possibilities in mind, let’s see how small can we make a computer. 
How big must a number be? We all know we can write numbers in base 2 as strings 
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of “bits” each a one or a zero. But how small can | write? Surely only one atom is 
needed to be in one state or another to determine if it represents a one or azero. And 
the next atom could be a one or a zero, so a little string of atoms are enough to hold 
a number, one atom for each bit. (Actually, since an atom can have more than just 
two states, we could use even fewer atoms, but one per bit is little enough! So, for 
intellectual entertainment, we consider whether we could make a computer in which 
the writing of bits is of atomic size, in which a bit is, for example, whether the spin in 
the atom is up for 1 or down for 0. And then our ‘transistor’, which changes the bits 
in different places, would correspond to some interaction between atoms which will 
change their states. The simplest example would be a kind of 3-atom interaction to 
be the fundamental element or gate in such acomputer. But again, the device won’t 
work right if we design it with the laws appropriate for large objects. We must use 
the new laws of physics, quantum mechanical laws, the laws that are appropriate 
to atomic motion (Fig. 15.8). We therefore have to ask whether the principles of 
quantum mechanics permit an arrangement of atoms so small in number as a few 
times the number of gates in a computer that could operate as a computer. This 
has been studied in principle, and such an arrangement has been found. Since the 
laws of quantum mechanics are reversible, we must use the invention by Bennett 
and Fredkin of reversible logic gates. When this quantum mechanical situation is 
studied, it is found that quantum mechanics adds no further limitations to anything 
that Mr. Bennett has said from thermodynamic considerations. Of course there is 
a limitation, the practical limitation anyway, that the bits must be of the size of 
an atom and a transistor 3 or 4 atoms. The quantum mechanical gate I used has 3 
atoms. (I would not try to write my bits onto nuclei, I'll wait till the technological 
development reaches atoms before I need to go any further!) That leaves us just 
with: (a) the limitations in size to the size of atoms; (b) the energy requirements 
depending on the time as worked out by Bennett; and (c) the feature that I did 
not mention concerning the speed of light — we can’t send the signals any faster 
than the speed of light. These are the only physical limitations on computers that 
I know of. 


If we somehow manage to make an atomic size computer, it would mean (Fig. 
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15.9) that the dimension, the linear dimension, is a thousand to ten thousand times 
smaller than those very tiny chips that we have now. It means that the volume of 
the computer is 100 billionth or 1071! of the present volume, because the volume 
of the ‘transistor’ is smaller by a factor 10—!1 than the transistors we make today. 
The energy requirement for a single switch is also about eleven orders of magnitude 
smaller than the energy required to switch the transistor today, and the time to make 
the transitions will be at least ten thousand times faster per step of calculation. So 
there is plenty of room for improvement in the computer and I leave this to you, 
practical people who work on computers, as an aim to get to. I underestimated 
how long it would take for Mr. Ezawa to translate what I said, and I have no more 
to say that I have prepared for today. Thank you! I will answer questions if you'd 
like. 


Questions and Answers 


Q: You mentioned that one bit of information can be stored in one atom, and | 
wonder if you can store the same amount of information in one quark. 


A: Yes. But we don’t have control of the quarks and that becomes a really 
impractical way to deal with things. You might think that what I am talking about 
is impractical, but I don’t believe so. When I am talking about atoms, I believe that 
some day we will be able to handle and control them individually. There would be so 
much energy involved in the quark interactions that they would be very dangerous 
to handle because of the radioactivity and so on. But the atomic energies that I am 
talking about are very familiar to us in chemical energies, electrical energies, and 
those are numbers that are within the realm of reality, I believe, however absurd it 
may seem at the moment. 


Q: You said that the smaller the computing element is the better. But, I think 
equipment has to be larger, because... 


A: You mean that your finger is too big to push the buttons? Is that what you 
mean? 


Q: Yes, it is. 


A: Of course, you are right. I am talking about internal computers perhaps for 
robots or other devices. The input and output is something that I didn’t discuss, 
whether the input comes from looking at pictures, hearing voices, or buttons being 
pushed. I am discussing how the computation is done in principle and not what 
form the output should take. It is certainly true that the input and the output 
cannot be reduced in most cases effectively beyond human dimensions. It is already 
too difficult to push the buttons on some of the computers with our big fingers. 
But with elaborate computing problems that take hours and hours, they could be 
done very rapidly on the very small machines with low energy consumption. That’s 
the kind of machine I was thinking of. Not the simple applications of adding two 
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numbers but elaborate calculations. 


Q: I would like to know your method to transform the information from one 
atomic scale element to another atomic scale element. If you will use a quantum 
mechanical or natural interaction between the two elements then such a device will 
become very close to Nature itself. For example, if we make a computer simulation, a 
Monte Carlo simulation of a magnet to study critical phenomena, then your atomic 
scale computer will be very close to the magnet itself. What are your thoughts 
about that? 


A: Yes. All things that we make are Nature. We arrange it in a way to suit our 
purpose, to make a calculation for a purpose. In a magnet there is some kind of 
relation, if you wish, there are some kinds of computations going on, just like there 
are in the solar system, in a way of thinking. But that might not be the calculation 
we want to make at the moment. What we need to make is a device for which we 
can change the programs and let it compute the problem that we want to solve, not 
just its own magnet problem that it likes to solve for itself. I can’t use the solar 
system for a computer unless it just happens that the problem that someone gave 
me was to find the motion of the planets, in which case all I have to do is to watch. 
There was an amusing article written as a joke. Far in the future, the “article” 
appears discussing a new method of making aerodynamical calculations: Instead 
of using the elaborate computers of the day, the author invents a simple device to 
blow air past the wing. (He reinvents the wind tunnel!) 


Q: I have recently read in a newspaper article that operations of the nerve system 
in a brain are much slower than present day computers and the unit in the nerve 
system is much smaller. Do you think that the computers you have talked about 
today have something in common with the nerve system in the brain? 


A: There is an analogy between the brain and the computer in that there are 
apparently elements that can switch under the control of others. Nerve impulses 
controlling or exciting other nerves, in a way that often depends upon whether 
more than one impulse comes in — something like an AND or its generalization. 
What is the amount of energy used in the brain cell for one of these transitions? 
I don’t know the number. The time it takes to make a switching in the brain is 
very much longer than it is in our computers even today, never mind the fancy 
business of some future atomic computer, but the brain’s interconnection system 
is much more elaborate. Each nerve is connected to thousands of other nerves, 
whereas we connect transistors only to two or three others. Some people look at 
the activity of the brain in action and see that in many respects it surpasses the 
computer of today, and in many other respects the computer surpasses ourselves. 
This inspires people to design machines that can do more. What often happens 
is that an engineer has an idea of how the brain works (in his opinion) and then 
designs a machine that behaves that way. This new machine may in fact work very 
well. But, I must warn you that that does not tell us anything about how the brain 
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actually works, nor is it necessary to ever really know that, in order to make a 
computer very capable. It is not necessary to understand the way birds flap their 
wings and how the feathers are designed in order to make a flying machine. It is 
not necessary to understand the lever system in the legs of a cheetah — an animal 
that runs fast — in order to make an automobile with wheels that goes very fast. 
It is therefore not necessary to imitate the behavior of Nature in detail in order to 
engineer a device which can in many respects surpass Nature’s abilities. It is an 
interesting subject and I like to talk about it. Your brain is very weak compared 
to a computer. I will give you a series of numbers, one, three, seven... Or rather, 
ichi, san, shichi, san, ni, go, ni, go, ichi, hachi, ichi, ni, ku, san, go. Now I want 
you to repeat them back to me. A computer can take tens of thousands of numbers 
and give me them back in reverse, or sum them or do lots of things that we cannot 
do. On the other hand, if I look at a face, in a glance I can tell you who it is if I 
know that person, or that I don’t know that person. We do not yet know how to 
make a computer system so that if we give it a pattern of a face it can tell us such 
information, even if it has seen many faces and you have tried to teach it. Another 
interesting example is chess playing machines. It is quite a surprise that we can 
make machines that play chess better than almost everybody in the room. But they 
do it by trying many many possibilities. If he moves here, then I could move here, 
and he can move there, and so forth. They look at each alternative and choose 
the best. Computers look at millions of alternatives, but a master chess player, a 
human, does it differently. He recognizes patterns. He looks at only thirty or forty 
positions before deciding what move to make. Therefore, although the rules are 
simpler in Go, machines that play Go are not very good, because in each position 
there are too many possibilities to move and there are too many things to check and 
the machines cannot look deeply. Therefore the problem of recognizing patterns and 
what to do under these circumstances is the thing that the computer engineers (they 
like to call themselves computer scientists) still find very difficult. It is certainly 
one of the important things for future computers, perhaps more important than the 
things I spoke about. Make a machine to play Go effectively! 


Q: I think that any method of computation would not be fruitful unless it would 
give a kind of provision on how to compose such devices or programs. I thought 
the Fredkin paper on conservative logic was very intriguing, but once I came to 
think of making a simple program using such devices I came to a halt because 
thinking out such a program is far more complex than the program itself. I think 
we could easily get into a kind of infinite regression because the process of making 
out a certain program would be more complex than the program itself and in trying 
to automate the process, the automating program would be much more complex 
and so on. Especially in this case where the program is hard wired rather than 
being separated as a software. I think it is fundamental to think of the ways of 
composition. 


A: We have some different experiences. There is no infinite regression: it stops 
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at a certain level of complexity. The machine that Fredkin ultimately is talking 
about and the one that I was talking about in the quantum mechanical case are 
both universal computers in the sense that they can be programmed to do various 
jobs. This is not a hard-wired program. They are no more hard-wired than an 
ordinary computer that you can put information in — the program is a part of the 
input — and the machine does the problem that it is assigned to do. It is hard-wired 
but it is universal like an ordinary computer.These things are very uncertain but | 
found an algorithm. If you have a program written for an irreversible machine, the 
ordinary program, then I can convert it to a reversible machine program by a direct 
translation scheme, which is very inefficient and uses many more steps. Then, in 
real situations, the number of steps can be much less. But at least I know that I 
can take a program with a 2n steps where it is irreversible, convert it to 3n steps of 
a reversible machine. That is many more steps. I did it very inefficiently since I did 
not try to find the minimum — just one way of doing it. I don’t really think that 
we'll find this regression that you speak of, but you might be right. I am uncertain. 


Q: Won’t we be sacrificing many of the merits we were expecting of such devices, 
because those reversible machines run so slow? I am very pessimistic about this 
point. 


A: They run slower, but they are very much smaller. I don’t make it reversible 
unless I need to. There is no point in making the machine reversible unless you 
are trying very hard to decrease the energy enormously, rather ridiculously, because 
with only 80 times kT the irreversible machine functions perfectly. That 80 is much 
less than the present day 10° or 10!° kT, so I have at least 10’ improvement in 
energy to make, and can still do it with irreversible machines! That’s true. That’s 
the right way to go, for the present. I entertain myself intellectually for fun, to ask 
how far could we go in principle, not in practice, and then I discover that I can 
go to a fraction of a kT of energy and make the machines microscopic, atomically 
microscopic. But to do so, I must use the reversible physical laws. Irreversibility 
comes because the heat is spread over a large number of atoms and can’t be gathered 
back again. When I make the machine very small, unless I allow a cooling element 
which is lots of atoms, I have to work reversibly. In practice there probably will 
never come a time when we will be unwilling to tie a little computer to a big 
piece of lead which contains 10!° atoms (which is still very small indeed) making it 
effectively irreversible. Therefore I agree with you that in practice, for a very long 
time and perhaps forever, we will use irreversible gates. On the other hand it is 
a part of the adventure of science to try to find a limitation in all directions and 
to stretch a human imagination as far as possible everywhere. Although at every 
stage it has looked as if such an activity was absurd and useless, it often turns out 
at least not to be useless. 


Q: Are there any limitations from the uncertainty principle? Are there any 
fundamental limitations on the energy and the clock time in your reversible machine 
scheme? 
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A: That was my exact point. There is no further limitation due to quantum 
mechanics. One must distinguish carefully between the energy lost or consumed 
irreversibly, the heat generated in the operation of the machine, and the energy 
content of the moving parts which might be extracted again. There is a relationship 
between the time and the energy which might be extracted again. But that energy 
which can be extracted again is not of any importance or concern. It would be like 
asking whether we should add the mc’, the rest energy, of all the atoms which are 
in the device. I only speak of the energy lost times the time, and then there is no 
limitation. However it is true that if you want to make a calculation at a certain 
extremely high speed, you have to supply to the machine parts which move fast and 
have energy, but that energy is not necessarily lost at each step of the calculation; 
it coasts through by inertia. 


A (to no Q): Could I just say with regard to the question of useless ideas, I'd like 
to add one more. I waited, if you would ask me, but you didn’t. So I will answer it 
anyway. How would we make a machine of such small dimensions where we have to 
put the atoms in special places? Today we have no machinery with moving parts 
whose dimension is extremely small, at the scale of atoms or hundreds of atoms 
even, but there is no physical limitation in that direction either. There is no reason 
why, when we lay down the silicon even today, the pieces cannot be made into little 
islands so that they are movable. We could also arrange small jets so we could 
squirt the different chemicals on certain locations. We can make machinery which 
is extremely small. Such machinery will be easy to control by the same kind of 
computer circuits that we make. Ultimately, for fun again and intellectual pleasure, 
we could imagine machines as tiny as a few microns across, with wheels and cables 
all interconnected by wires, silicon connections, so that the thing as a whole, a very 
large device, moves not like the awkward motions of our present stiff machines but 
in a smooth way of the neck of a swan, which after all is a lot of little machines, the 
cells all interconnected and all controlled in a smooth way. Why can’t we do that 
ourselves? 
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Dedicated to Richard Feynman 


Abstract 


Ten years ago, we were all sure that parallel computing technology and the inter- 
disciplinary academic field of computational science would be center pieces of both 
academic and economic growth. We show that this insight was, in principle, cor- 
rect but was an incomplete vision for large-scale computation implies both increased 
computer power and increasing numbers of users and applications. Parallel comput- 
ing undoubtedly works on essentially all problems, but we were unable to produce 
deployable software systems. Further, few industries could achieve adequate return 
to justify investment in parallel computers, except in a few areas such as databases. 
Computational science is the academic field on the interface of computer science 
with fields such as physics, chemistry, and applied mathematics. This expertise 
allows you to be very useful and, in principle, is an excellent area of study, but is 


not a wise field for many students as employers and universities prefer traditional 
fields. 


We show how parallel computing and computational science has evolved into 
Internetics, which is a vibrant growing and much larger field that surely does work 
both in principle and in practice. Internetics embodies the technologies and exper- 
tise used in building large-scale distributed systems and linking fields like physics 
not just with parallel computers, but with the Web of complex heterogeneous com- 
puters. This is CORBA and Java, and not just MPI and HPF. It is Internetics 
that is the emerging academic field, and not computational science, and internetics 
is of growing attraction to students and employers. Using an Internetics base, we 
will produce much better software environments for parallel systems, but the com- 
mercial and academic fields associated with parallelism will not grow in the near 
future. 
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We argue that we almost “got it right” and the essential features of the original 
vision were correct and are part of current broader thrust. 


16.1 Introduction 


In our first book on parallel computing, we joyfully used the well-known fairy tale 
centered on a mirror that could be asked the question, 


“Mirror mirror on the wall — which is the most powerful computer of them all?” 


We thought, in 1987, that the choice was between a microprocessor-based paral- 
lel array, and a traditional vector supercomputer. However, the mirror distorted our 
vision, and what we should have seen was a distributed array, and not just a closely 
coupled parallel simulation, but a complex metaproblem with multiple concurrent 
asynchronous components. There should not be one power user using a single large 
machine, but communities linked by a geographically distributed ensemble that, 
incidentally, could include one or more large parallel systems. 


In our current vision, applications of interest extend from those in science and 
engineering to the information area; computing with a hint of communications 
— the original parallel computing thrust — becomes communications with some 
large-scale computing; compilers become interpreters; Fortran becomes Java, and 
many changes like these. We term this overall concept as Internetics, which is 
the field centered on technologies, applications, and services enabled by worldwide 
computing and communications. This is defined to be interdisciplinary, as both 
base technologies and applications are included. It includes issues of large scale 
from all points of view, not just large individual parallel or distributed compute 
engines or networks, but also one web client talking to one server. This is large 
scale for a different reason — it is pervasive. 


As described in [1, 2], we found in our work at Caltech that interdisciplinary 
research at the interface of computer science and areas such as physics and chemistry 
was very rewarding [3]. Indeed, individuals who knew both areas, seemed well 
placed to lead the expected surge of interest in large-scale simulations using parallel 
systems. Several other groups came to similar conclusions, and the academic field 
of computational science was set up in several universities, and studied in many 
conferences [4]. However, this initiative seems to have stalled as student interest 
has shifted in both the computer science and application areas. In the latter case, in 
fact, fields like physics are seeing a general drop of enrollment as this is perceived as 
a difficult area to get jobs. Classic computational science (computational physics) 
is not a large enough field to change this. We argue that internetics combined with 
physics could, however, offer significant growth opportunities for this and similar 
traditional fields. 
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16.2 Does Parallel Computing Work? 


It has been clear for some ten years or more than one can parallelize the majority of 
large-scale applications. Further, this parallelization scales and can be implemented 
on machines with very many nodes. The essential point is that one only needs to 
parallelize large problems, and these can be usually thought of as an algorithm 
applied iteratively to a dataset. The computation is large because the dataset 
is large. Then parallelism is achieved by breaking the dataset up into parts and 
placing one part in each processor. In Figures 16.1 and 16.2, we illustrate this for 
the examples we studied at Caltech 


e Seismic wave propagation 
e Astrophysics 
e Computer Chess 


e Hadrian’s Palace (adapted from an earlier example using his wall) 


The datasets are respectively the terrain in which the waves are propagated; 
the universe in which the galaxies are simulated; the set of moves in a computer 
generated decision tree; and the set of bricks that the masons must lay to build 
the wall and tile the floor. The latter analogy (where the “computation” is per- 
formed by humans and not digital systems) shows that domain decomposition and 
parallelism is well known and has established success in all aspects of the human 
experience. While at Caltech, I use to remark that NASA, when it needed to build 
a shuttle, did or rather could not hire Superman to address the task; rather some 
50,000 workers were hired. These built the shuttles using a dynamic complex het- 
erogeneous decomposition of this single problem. The workers had to be instructed 
(programmed); be arranged in a hierarchical set of teams (architecture); and the 
process was designed to ensure workers could proceed effectively and not spend too 
much time interfering with other team members (minimize inter-processor commu- 
nication). This was accompanied by dynamic planning and assignment of tasks 
(adaptive dynamic resource management). The parallel computing terminology is 
placed in brackets and it is clear that the fundamental computer science issues are 
familiar concepts in society and that in principal they should be “naturally” solu- 
ble. Further, one noted that Nature’s parallel systems all communicate via message 
passing whether they be swarms of bees, colonies of ants, collections of neurons, or 
teams of human minds. 


So we were very confident that parallel computing was possible, and set up the 
“Caltech Concurrent Computation Program” to demonstrate this. The parameters 
of hypercubes built first by Seitz and then JPL were chosen to support this type of 
parallelism. As rather tardily demonstrated [3], we successfully implemented over 
50 distinct significant parallel applications. 
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Fig. 16.1. Three examples of parallel computing using domain decomposition to map 
problem onto computer. 
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Fig. 16.2. Parallel Construction of Hadrian’s Palace — tilers laying dance floor in two- 
dimensional decomposition with masons building the wall with a one-dimensional decom- 
position. 


Now the above arguments have compelling generality but are, of course, super- 
ficial. There are important cases where parallelism is not trivial, including cases 
where time and not dataset size is the “large” parameter. Here, we looked at stud- 
ies of the motion of the solar system, with a few-way parallelism, over long time 
periods. Solar system studies (using parallelism over planets) cannot use massive 
parallelism directly but typically planetary evolution is sensitive to poorly known 
initial conditions. These would be studied by multiple runs with different parameter 
values. This exploratory work is, as they say today, pleasingly parallel (in Feyn- 
man’s day, it was “embarrassingly” parallel) and recovers our ability to use parallel 
systems. Gerry Sussman implemented this parallelism with a specialized digital 
orrery while on sabbatical at Caltech. This, like the hypercube, was discussed in 
Feynman’s class. A second and more important case is event-driven simulations, 
which are commonly used in the modeling and simulation of macroscopic systems. 
The military is a major user of this technology and in the U.S.A., this work is 
coordinated by the DMSO office [5]. Event-driven simulations must execute enti- 
ties in the time ordering of event occurrences and this essentially sequentializes the 
myriad of components. This is in principle insoluble, but in practice parallelism 
can be found as events in a large simulation are geographically distributed and do 
not effect each other for long time periods. This feature, combined with various 
ingenious variations of the time warp rollback mechanism, actually allows large- 
scale event-driven simulations to run effectively even on relatively loosely coupled 
distributed systems. In fact, while at Caltech investigations of such problems was 
a major activity of our collaboration with the Jet Propulsion Laboratory. 


However, these nifty technical issues are not the reason why parallel computing is 
or is not successful. Rather, the critical point was explained one day in a wonderful 
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Fig. 16.3. Phase Transitions in Complex Systems 


public lecture by Carver Mead in Caltech’s Beckman Auditorium. He explained how 
the computing industry faced and would see many technology transitions. However, 
any new approach needed enough “headroom” to replace the old way. Changes in 
deployed technology are like the phase transitions in physics to which Feynman 
made so many contributions. 


Systems can live for a long time in a “false minimum” that is the older technology 
if there is a substantial energy barrier to change, as shown in Figure 16.3. 


We can use a complex systems language as advocated by Feynman’s colleague, 
Gell-Mann, and the Santa Fe Institute. A complex system is a set of interconnected 
entities that, although governed at a low level by standard laws of physics, have 
interactions that are best described by a “macroscopic coarse graining” As most en- 
terprises involve some sort of optimization, one can usually associate a phenomeno- 
logical energy function that is minimized by our complex system. For instance, the 
“no-arbitrage opportunity” used in economic modeling implies that one can view 
the stock market as a complex system. Trading is the heat bath providing a myriad 
of microscopic interaction that equilibrates this system, and financial instruments 
are priced by maximizing value. In the computer industry, market forces equilibrate 
the system while innovation causes the complex system to evolve while always max- 
imizing some unclear energy function representing customer satisfaction per unit 
dollar. 


After this digression, we return to our discussion of parallel computing. In the 
framework of Figure 16.3, taking an over simplified view that captures the essence, 
there are indeed two minima: the “current sequential computing” and the “large 
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scale parallel systems” minima. The technical computer science studies show that 
“Parallel Computing Works” so this second minima is distinct, well defined, and 
lower than the “sequential minimum” However, sequential computing technology 
has advanced dramatically over the last 15 years and the “headroom” shown in 
Figure 16.3 is not so great. We argued that parallel computing was inevitable, as 
the feature size reduction in chips implied one would “have to spend” one’s computer 
budget on parallel systems as technology reduced the unit sequential system cost. 
This argument is fallacious for two reasons. Firstly, the industry made the sequential 
chip architecture “better” as we increased from a few hundred thousand to the 
current several million transistors in each chip. This did involve parallelism, but 
only that which could be implemented without user intervention. Current chips are 
much faster than those of five years ago but, in fact, use transistors less efficiently, 
as “automatic parallelism” is not as efficient as (user directed) data parallelism. 
However, this approach stays in the “same minimum” of Figure 16.3 and requires 
no “phase transitions” Thus, it is chosen by our market forces. A second, and 
perhaps more important, development is that effectively users are spending less 
money each on computing. The dominant thrust in the computing industry is not 
on a few very powerful systems but on wide spread deployment of very many small 
(PC’s) systems. This is the critical point we missed — large-scale computing was as 
we always said inevitable, but the scaling included not just the number of processors 
(as we foretold), but also the number of users. Thus, the dominant system today 
is not a central closely coupled parallel system linking many individuals and their 
machine together. There are important problems that still need all the computing 
they can get. These include large- scale academic computations such as astrophysics 
and quantum chemistry, and many areas of importance to national security, such 
as the well-known U.S. Department of Energy ASCI program to model nuclear 
stockpiles. However, more generally, the anticipated growth has not been in this 
area, but rather in the distributed systems area where the user base has increased 
at the same rate as the deployment of computer power. 


Returning to Figure 16.3, the “barrier to change” is very large and this is central 
to understanding why the use of large-scale parallel systems has not expanded. We 
know good parallel algorithms for almost every important problem and can express 
these in efficient parallel software. Unfortunately, there are three major difficulties 
with this process. Firstly, there is usually no easy way to “port (migrate)” existing 
sequential code to parallel systems. Secondly, the current clean parallel languages 
are all low level — and, in fact, not much better than what we used in Feynman's 
courses 10-15 years ago. They are based around explicit user-specified message pass- 
ing, as in the current PVM and MPI systems today, or the CrOS system we used at 
Caltech. There are better higher-level systems, but these are not universal “silver 
bullets” and do not provide a clearly excellent broad-base programming environ- 
ment, so we have no compelling high-level language that expresses the majority of 
problems. Even for what we know how to do, there is a further difficulty. Namely, 
the high-performance parallel-computing field is of order 1% in dollar volume of 
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the commodity computer market. However, it has all the software problems that 
PCs have plus all the additional parallel computing issues. Thus, the field does not 
have the capital investment or market size to be able to develop quality software. 
We have argued recently that this implies that the parallel-computing field should 
rethink its software strategy (6, 7]. It should build wherever possible on top of 
software produced for commodity markets, such as the web and business enterprise 
systems. Here we view parallel computers as a special case of distributed systems 
with especially tight synchronization constraints. 


Note that distributed computing assumes problems are already decomposed and 
designs software to access, store, and integrate decomposed parts together. Parallel 
computing’s central difficulty is different — it is finding a way of expressing tightly 
integrated problems in a way that they can be efficiently decomposed. We need 
to focus on this problem, and integrate it with tools taken where possible from 
the much larger commodity market. Previously, the high-performance parallel- 
computing community has tried to solve an essentially impossible problem — de- 
velop a complete programming environment from scratch with much less available 
resources than the existing sequential “false minimum” of Figure 16.3. 


Thus, we see that parallel computing established its possibilities but did not, 
in Carver Mead’s terminology, have enough headroom to effect transition from the 
current sequential “meta-stable equilibrium state” 


We can ask if this will change? Firstly, we have oversimplified, as always in such 
broad-based discussions. There are important areas — especially parallel databases 
where the transition has been successful. This is understandable because once a 
single tool (the database) was parallelized, all applications of it could take advan- 
tage of parallel machines. Secondly, there is one compelling argument in favor of 
the inevitable adoption of parallel techniques. One notes that personal computer 
chips will “soon” have so many transistors that designers must use them to imple- 
ment parallelism. It is argued that this will force the commodity market to take 
parallelism seriously, and drive the pervasive deployment of this technology. This 
argument has some truth to it, but it is not clear that the degree of parallelism 
involved is enough. If one “just” needs to use up a “few-way” parallelism, then 
the functional approach (as used by (Java) threads in simultaneously processing 
different components of a web page) may be sufficient. Such parallelism avoids the 
critical difficulties of large-scale data parallelism, and will stay in current minimum 
and not drive the phase transition of Figure 16.3. Thus, we expect, over the next 
five years, that the level of activity and importance of parallel computing will re- 
main roughly constant. It will not become the dominant force we thought in the 
early 1990s. We do expect that parallel-programming environments will improve 
significantly, but not enough to make it possible to easily leap across the boundary 
in Figure 16.3. 
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16.3 Is Computational Science a New Academic Field? 


So while at Caltech, my studies of both high energy physics and parallel computing 
were helped immeasurably by an excellent group of students who I diligently trained 
in an interdisciplinary fashion. I have no doubt that this training was highly effec- 
tive and was essential for the generally accepted success of our activities. However, 
these students were typically not so successful on their graduation. Their training 
was a “jack of all trades” (or at least two trades) and getting a good job — espe- 
cially in universities — requires excellence in one recognized field. I have, for the 
last 10 years, recommended students not to perform interdisciplinary work until 
“they get tenure” This advice flies somewhat in the face of the growing interest in 
interdisciplinary activities by funding agencies — especially the National Science 
Foundation. However, not entirely, as one can perform interdisciplinary activities 
in two ways; firstly, using one or more individuals — each of whose expertise spans 
multiple fields; secondly, one can build a team of specialized individuals whose 
combined knowledge spans multiple fields. The latter has, in my opinion, been the 
mode adopted successfully in most recent projects whereas at Caltech, I largely 
used the first model. Good universities find it hard to hire interdisciplinary faculty. 
The tenure review system, in spite of some flaws, has successfully built the qual- 
ity American research universities. This assumes there is a peer group inside and 
outside the university that can provide reliable information on which to judge the 
merits of faculty promotions. This is almost impossible to do in interdisciplinary 
fields where a candidate, whose work falls into multiple areas, will get less than 
perfect reviews in any of the component areas of his or her expertise. 


There is another more mundane problem with implementing interdisciplinary 
fields. Suppose we have N basic fields — physics, chemistry, biology, medicine, 
computer science, electrical engineering, environmental studies, and so on. Then 
we can design 2’-1 interdisciplinary areas by choosing any combination of these 
basic fields. This leads to a plethora of subjects with probably limited life times 
and no good way to choose where to focus. Thus, it seems best to set up academic 
institutions with a few core subjects (a basic liberal arts education perhaps?) and 
building around this an evolving web of interdisciplinary studies. Interdisciplinary 
work can be recognized by certificates, minors, masters or other “lesser degree” 
forms. Even this modest goal requires changes in university structure as currently 
the core subjects have too many requirements and a broader educational experience 
is hard. However, I believe these changes can and, in fact, probably will be made. 
In particular, one core field physics is seeing a major reduction in enrollment as it is 
correctly perceived that there are few jobs in the “pure field” of physics. However, 
I have found physics is, in fact, an excellent training for general interdisciplinary 
research. Physics teaches problem solving based on fundamental principles — a 
good approach in most areas. 


My attempts to understand the academic role of computational science revealed 
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Fig. 16.4. A classical view of computational science as implemented at Syracuse in the 
early 1990s. 


another bothering recurring theme. Namely, nobody could agree as to what it was. 
Individuals at Caltech insisted it was the same as what I call (L; computational 7) 
i.e., an amalgam of computational physics, chemistry, etc. However, NSF in its re- 
cent “partners for advanced computational infrastructure” solicitation, I think views 
it, as I do, more broadly as shown in Figure 16.4. This includes as well “applied 
computer science” or those aspects of computer science involved with hardware, 
software, and algorithms of scientific and engineering computation. Most academic 
implementations have, in fact, given computational science a central home in some 
sort of computer science or applied mathematics department and so emphasizing 
the last component. The academic computing tower of Babel is further confused by 
the fields of computer engineering, computational science and engineering, and sci- 
entific computing. These take the fields we have discussed and given them different 
emphases, but leading to, again, good educational opportunities with an unclear 
national recognition. The major problems for students in computational science is 
that most employees generally have no idea what the word “computational science” 
means, and computer science is the best academic degree with which to hunt for 
jobs. This being said, it is also true that it is the applied and not the theoretical 
computer science skills that most employers demand. Thus, the strategy explained 
at the beginning of this section is sound. Get a degree labeled by a basic well un- 
derstood field such as computer science, but arrange one’s studies to obtain “lesser 
degrees” (master, certificates) demonstrating proficiency in fields like computational 
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science, which teach good practical skills. 


After these general remarks on interdisciplinary research, let us discuss com- 
putational science defined as the academic field lying in between computer science 
(and applied mathematics/computer engineering) and the various fields of science 
and engineering that use high-performance (parallel) computing. We show in Fig- 
ure 16.4, the particular view of computational science we developed at Syracuse 
University. This was designed in accordance with earlier remarks to be imple- 
mented within the existing academic framework and not require a new academic 
unit. This is consistent with approaches at other universities as described in John 
Rice’s fine summary [4]. 


Our vision for the success of this field is well captured in these words from 
Daedalus [2] in 1992. 


“This essay is constructed around a single premise: the inexorable in- 
crease in the performance of computers can open up new vistas in es- 
sentially all fields. We need skilled people to explore and exploit these 
possibilities, however, and our educational system is behind the times. 
Current curricula at grade schools and colleges will not educate stu- 
dents to exploit the possibilities opened up by parallel computers and 
the emergence of the computational methodology. Furthermore, the 
young but relatively traditional field of computer science will only give 
us a small fraction of the scientists in the computational wave that will 
lead the revolution. Computer scientists will develop the wonderful ma- 
chines — a critical enabling technology. However, what we need most 
are computational scientists — individuals trained to use computers. 
High-performance computing is critical to the nation’s needs. The Gulf 
War illustrated this in our military, but the future battles will increas- 
ingly be economic. Thus, high-performance computers can assure the 
industrial competitiveness of the nation, but this can only be true if we 
educate those who can use parallel computers in new ways for industry.” 


These were sound arguments, but they embody the flaw exposed in Section 2. 
Computation will open up new vistas in essentially all fields, but parallel com- 
puting on its own will not. I moved from permanent summer to winter (Cal- 
tech to Syracuse) because I wanted to implement the original computational sci- 
ence vision broadly in a university closely linked to the real world (industry). 
However, I soon realized the flaw as when I surveyed New York State indus- 
try [8-12], I found little interest in parallel computing. In fact, at Syracuse, I 
developed some reasonable core courses in computational science [CPS615,713 see 
http://www.npac.syr.edu/Education] but after peaking with some 50 students and 
two sections in the early 90s, student interest has waned and the current enrollment 
is down by an order of magnitude. Syracuse students are pretty bright, but they 
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tend to be pragmatic and go to courses where they think there are jobs to be found. 
Anticipating the discussion of the next section, over the period 1995-98, enrollment 
in web technology classes has grown by an order of magnitude. 


So we do need to change university curricula as computing is impacting every 
field and this must be reflected in the material students learn. However, I now 
believe that our original vision of computational science was too limited, and not 
broad enough to survive the inevitable slings and arrows of uncertain technological 
progress. We do need some key characteristics — in particular, strong flexible 
core subject curricula with enough latitude that there is room for students to take 
interdisciplinary studies. This is a sound lesson we have learnt from the experiments 
in computational science. 


16.4 Internetics — The Correct Vision? 


In last two sections, we described parallel computing as the expected driving tech- 
nology that would increase the role of computation in all fields and so drive a new 
interdisciplinary academic field of computational science. This vision was not quite 
right as it did not anticipate the rapid improvement in sequential architectures. 
Further, it missed a critical feature of “large-scale” namely, it was more impor- 
tant to scale the number of people involved than to scale the power of individual 
computers. 


The new technology vision is the wide spread deployment of computational 
devices and communication links with the essentially identical architecture whether 
in a central massively parallel server or a distributed set of digital set-top boxes in 
a suburban community. All are linked commodity processors exchanging messages 
between themselves. This scenario will not be trapped in a niche market that is 
1% of the total but rather will overtake all computer systems. In this world, the 
operating system seen by the users is WebWindows [13] as currently illustrated by 
the integration of Microsoft’s Internet Explorer with the PC Windows Operating 
System. According to the basic market principles, web technology will lead to 
the best available software as it addresses the largest possible market, and so can 
amortize software development costs over the largest possible volume. Further, 
the web has a particularly good creative model as its modular distributed software 
design is designed and built by a loosely knit world wide software team. Note we 
have had previous pervasive software models such as those of IBM mainframes or of 
the PC itself. The web is different from these, as it is a pervasive complete software 
model that addresses distributed heterogeneous computing. We can regard any 
other computing system as a special case of the web. Studying parallel computing 
will start with its distributed computing base, and as mentioned earlier, add support 
for tight synchronization. Studying military or electronic commerce applications 
will add security to the mix, while CORBA is naturally linked to the web to satisfy 
the need for managed distributed objects. 
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Fig. 16.5. Professor Xiamong Li’s view of Internetics 


We can follow Xiaoming Li and call the resultant field Internetics. This is the 
study of technologies enabling, and applications enabled by, the world wide, large- 
scale, object web hardware and software infrastructure. As shown in Figure 16.5, 
this field includes computing, but also a rich collection of networking and infor- 
mation infrastructure, services, and tools. We had realized the importance of this 
area from our sad survey at Syracuse, which had shown that industry in New York 
State was not so interested in parallel computing. We did identify that although 
large-scale simulation was of “tertiary importance” (as told to me by a now defunct 
military aircraft company), information processing was of general interest. I also 
like to recount the tale of a large appliance company that could only find a small, 
beleaguered audience of six engineers for my talk on the value of simulation and 
high-performance computing. However, a few years later, they were ecstatic to learn 
from us how to link their product database to the web. 


Even while we were setting up a “classical” computational science program at 
Syracuse, we realized early on from the feedback from industry that it was incom- 
plete. So, in late 1995, we started an “information track” of the computational 
science program at Syracuse. The idea is illustrated in Figure 16.6 and generalizes 
the concept explained in Section 3 that computational science was at the interface 
of “applied computer science” and a set of applications. In the information track, 
we replace the science and engineering applications of Figure 16.4 with areas shown 
in Figure 16.6 where information processing is the key computation task. This in- 
cludes areas such as education, health care, crisis management, journalism (perhaps 
using the web for dissemination), and marketing. 


Combining Figure 16.4 and 16.6, we find the academic implementation of Inter- 
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Fig. 16.6. The 1995 extension of Figure 16.4 at Syracuse to include all applications into 
computational science. This is a forerunner of an academic implementation of Internetics. 


netics as the field that lies between modern applied computer science and application 
areas. We have designed a tentative Internetics curricula running from the K-12 
(school children) to graduate level. It starts by teaching school children the essence 
of the web and how to program in Java. Java is a particularly good language for 
the K-12 age group, as it has good graphics and obvious utility in improving web 
pages. Thus, we can easily motivate the language and our beginning programmers 
get the gratification of better personal web pages to share with their peers. The 
technologies of Internetics are more social than that of the original computational 
science. At the graduate level, we designed a six semester course certificate covering 
technologies (such as the basic Web, VRML, multimedia, collaboration, distributed 
objects) and a choice of application specializations, such as those in Figures 16.4 
and 16.6. 


We see a general trend towards Internetics (although, of course, typically not 
with this name) but so far there is not the necessary consensus to expect widespread 
adoption. For instance, as a subset of Internetics, an interesting field is called 
by some just “multimedia” and Syracuse scoped this out, but did not adopt a 
“masters in multimedia” program. However, we see that Internetics embodies the 
essential vision of computational science that the use of modern computers and 
communications systems will revolutionize many fields. Thus, it is essential for both 
academic and economic reasons to train a generation of students to be familiar with 
both computing and particular applications. In fact, there is substantial interest 
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from industry in retraining existing workers in the techniques of Internetics. 


Thus, although details of our original vision were flawed, it is included in the 
new broader picture, which will succeed. 
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RICHARD FEYNMAN AND THE 
CONNECTION MACHINE 


W. Daniel Hillis * 


17.1 Introduction 


One day in the spring of 1983, when I was having lunch with Richard Feynman, 
I mentioned to him that I was planning to start a company to build a parallel 
computer with a million processors. (I was at the time a graduate student at the 
MIT Artificial Intelligence Lab.) His reaction was unequivocal: “That is positively 
the dopiest idea I ever heard.” For Richard, a crazy idea was an opportunity to 
prove it wrong or prove it right. Either way, he was interested. By the end of lunch 
he had agreed to spend the summer working at the company. 


Richard had as much fun with computers as anyone I ever knew. His interest in 
computing went back to his days at Los Alamos, where he supervised the “Comput- 
ers” — that is, the people who operated the mechanical calculators there. He was 
instrumental in setting up some of the first plug-programable tabulating machines 
for physical simulation. His interest in the field was heightened in the late 1970s 
when his son Carl began studying computers at MIT. 


I got to know Richard through his son. Carl was one of the undergraduates 
helping me with my thesis project. I was trying to design a computer fast enough 
to solve commonsense reasoning problems. The machine, as we envisioned it, would 
include a million tiny computers, all connected by a communications network. We 
called it the Connection Machine. Richard, always interested in his son’s activities, 
followed the project closely. He was sceptical about the idea, but whenever we met 
at a conference or during my visits to Caltech, we would stay up until the early 
hours of the morning discussing details of the planned machine. Our lunchtime 
meeting on that spring day in 1983 was the first time he ever seemed to believe we 
were really going to try build it. 


Richard arrived in Boston the day after the company was incorporated. We had 
been busy raising the money, finding a place to rent, issuing stock and so on. We 
had found an old mansion just outside the city, and when Richard showed up we 
were still recovering from the shock of having the first few million dollars in the 
bank. No one had thought about anything technical for months. We were arguing 
about what the name of the company should be when Richard walked in, salutec 
and said “Richard Feynman reporting for duty. OK, boss, what’s my assignment?” 


*This article was originally published in the February 1989 issue of Physics Today. 
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The assembled group of not-quite-graduated MIT students was astounded. After 
a hurried private discussion (“I don’t know, you hired him...” ) we informed Richard 
that his assignment would be to advise the application of parallel processing to 
scientific problems. “That sounds like a bunch of baloney,” he said. “Give me 
something real to do.” So we sent him out to buy some office supplies. While he 
was gone, we decided that the part of the machine we were most worried about 
was the router that delivered messages from one processor to another. We were not 
entirely sure that our planned design would work. When Richard returned from 
buying pencils, we gave him the assignment of analyzing the router. 


17.2 The Machine 


The router of the Connection Machine was the part of the hardware that allowed 
the processors to communicate. It was a complicated object; by comparison, the 
processors themselves were straightforward. Connecting a separate wire between 
every pair of processors was totally impractical; a million processors would require 
1012 wires. Instead, we planned to connect the processors in the pattern of a 20- 
dimensional hypercube, so that each processor would only need to talk directly to 
20 others. Because many processors had to communicate simultaneously, many 
messages would contend for the same wire. The router’s job was to find a free path 
through this 20-dimensional traffic jam or, if it couldn’t, to hold the message in a 
buffer until a path became free. Our question to Feynman was: “Had we allowed 
enough buffers for the router to operate efficiently?” 


In those first few months Richard began studying the router circuit diagrams 
as if they were objects of Nature. He was willing to listen to explanations of how 
and why things worked a certain way, but fundamentally he preferred to figure 
everything out himself. He would sit in the woods behind the mansion and simulate 
the action of each circuit with pencil and paper. Meanwhile, the rest of us, happy 
to have found something to keep Richard occupied, went about the business of 
ordering the furniture and computers, hiring the first engineers and arranging for 
the Defense Advanced Research Projects Agency to pay for the development of 
the first prototype. Richard did a remarkable job of focusing on his “assignment,” 
stopping only occasionally to help wire the computer room, set up the machine shop, 
shake hands with the investors, install the telephones and cheerfully remind us of 
how crazy we all were. When we finally picked the name of the company, Thinking 
Machines Corporation, Richard was delighted. “That’s good. Now I don’t have to 
explain to people that I work with a bunch of loonies. I can just tell them the name 
of the company.” 


The technical side of the project was definitely stretching our capacities. We 
had decided to simplify things by starting with only 64,000 processors, but even 
then the amount of work to be done was overwhelming. We had to design our 
own silicon integrated circuits, with processors and a router. We also had to invent 
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packaging and cooling mechanisms, write compilers and assemblers, devise ways of 
testing processors simultaneously and so on. Even simple problems like wiring the 
boards together took on a whole new meaning when you were working with tens 
of thousands of processors. In retrospect, if we had had any understanding of how 
complicated the project was going to be, we would never have started. 


17.3 ‘Get These Guys Organized’ 


I had never managed a large group before, and I was clearly in over my head. 
Richard volunteered to help out. “We've got to get these guys organized” he told 
me. “Let me tell you how we did it at Los Alamos.” It seems that every great 
man has a certain time and place in his life that he takes as a reference point ever 
after: a time when things worked as they were supposed to and great deeds were 
accomplished. For Richard, that time was at Los Alamos during the Manhattan 
project. Whenever things got “cockeyed,” Richard would look back and try to 
understand how now was different from then. Using this formula, Richard decided 
we should pick an expert in each area of importance to the machine — software, 


packaging, electronics and so on — to become the “group leader” of that area, just 
as it had been at Los Alamos. 


Part two of Feynman’s “Let’s Get Organized” campaign was a regular semi- 
nar series of invited speakers who might suggest interesting uses for our machine. 
Richard’s idea was that we should concentrate on people with new applications, 
because they would be less conservative about what kind of computer they would 
use. For our first seminar he invited John Hopfield, a friend of his from Caltech, 
to give us a talk on his scheme for building neural networks. In 1983, studying 
neural networks was about as fashionable as studying ESP, so some people consid- 


ered Hopfield a little crazy. Richard was certain he would fit right in at Thinking 
Machines. 


What Hopfield had invented was a way of constructing an associative memory,” 
a device for remembering patterns. To use an associative memory, one trains it on 
a series of patterns — for example, pictures of letters of the alphabet. Later, when 
the memory is shown a new pattern, it is able to recall a similar pattern it has seen 
in the past. A new picture of the letter A will “remind” the memory of another 
A it has seen before. Hopfield figured out how such a memory could be built from 
devices functionally similar to biological neurons. 


Not only did Hopfield’s method seem to work; it seemed to work particularly 
well on the Connection Machine. Feynman figured out the details of how to use 
one processor to simulate each of Hopfleld’s neurons, with the strength of each 
connection represented as a number in the processor’s memory. Because of the 
parallel nature of Hopfield’s algorithm, all the processors could be used concurrently 
with 100 percent efficiency; the Connection Machine would thus be hundreds of 
times faster than any conventional computer. 
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17.4 An Algorithm for Logarithms 


Feynman worked out in some detail the program for computing Hopfield’s network 
on the Connection Machine. The part that he was proudest of was the subroutine for 
computing a logarithm. I mention it here not only because it is a clever algorithm, 
but also because it is a specific contribution Richard made to the mainstream of 
computer science. He had invented it at Los Alamos. 


Consider the problem of finding the logarithm of a fractional number between 
1 and 2. (The algorithm can be generalized without too much difficulty.) Feyn- 
man observed that any such number can be uniquely represented as a product of 
numbers of the form 1+2—*, where k is an integer. Testing for the presence of 
each of these factors in a binary representation is simply a matter of a shift and 
a subtraction. Once the factors are determined, the logarithm can be computed 
by adding together the precomputed logarithms of the factors. The algorithm fit 
the Connection Machine especially well because the small table of the logarithms 
of 1+2-* could be shared by all the processors. The entire computation took less 
time than doing a division. 


Concentrating on the algorithm for a basic arithmetic operation was typical of 
Richard’s approach. He loved the details. In studying the router he paid attention 
to the action of each individual gate, and in writing the program he insisted on 
understanding the implementation of every instruction. He distrusted abstractions 
that could not be directly related to the facts. When, several years later, I wrote a 
general-interest article on the Connection Machine for Scientific American, he was 
disappointed that it left out too many details. He asked “How is anyone supposed 
to know that this isn’t just a bunch of crap?” 


Feynman’s insistence on looking at the details helped us discover the potential 
of the machine for numerical computing and physical simulation. We had thought 
that the Connection Machine would not be efficient at “number crunching,” because 
the first prototype had no special hardware for vectors or floating-point arithmetic. 
Both of these were “known” to be requirements for number crunching. Feynman 
decided to test this assumption on a problem he was familiar with in detail: quantum 
chromodynamics . Quantum chromodynamics is the presently accepted field theory 
of the strongly interacting elementary particles in terms of their constitutent quarks 
and gluons. It can, in principle, be used to compute the mass of the proton (in units 
of the pion mass). In practice, such a computation might require so much arithmetic 
that it would keep the fastest computers in the world busy for years. One way to 
do the calculation is to use a discrete four-dimensional lattice to model a section 
of space-time. Finding the solution involves adding up the contributions of all the 
possible configurations of certain matrices at the links of the lattice, or at least 
some large representative sample. (This is essentially a Feynman path integral.) 
What makes this so difficult is that calculating the contribution of even a single 
configuration involves multiplying the matrices around every loop in the lattice, 
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and the number of loops grows as the fourth power of the lattice size. Because all 
these multiplications can take place concurrently, there is plenty of opportunity to 
keep all 64,000 processors busy. 


To find out how well this would work in practice, Feynman had to write a com- 
puter program for quantum chromodynamics. Because BASIC was the only com- 
puter language Richard was really familiar with, he made up a parallel-processing 
version of BASIC in which he wrote the program. He then simulated the operation 
of the program by hand to estimate how fast it would run on the Connection Ma- 
chine. He was excited by the results: “Hey Danny, you’re not gonna believe this, 
but that machine of yours can actually do something useful!” According to Feyn- 
man’s calculations, the Connection Machine, even without any special hardware for 
floating-point arithmetic, would out-perform a machine that Caltech was building 
explicitly for quantum chromodynamics calculations. From that point on, Richard 
pushed us more and more toward looking at numerical applications of the machine. 


By the end of that summer of 1983, Richard had completed his analysis of the 
behavior of the router, and much to our surprise and amusement, he presented his 
answer in the form of a set of partial differential equations. To a physicist this may 
seem natural, but to a computer designer it seems a bit strange to treat a set of 
Boolean circuits as a continuous, differentiable system. Feynman’s router equations 
were written in terms of variables representing continuous quantities such as “the 
average number of 1 bits in a message address.” I was much more accustomed to 
inductive proof and case analysis than to taking the time derivative of “the number 
of l’s.” Our discrete analysis said we needed seven buffers per chip: Feynman’s 
differential equations suggested we only needed five. We decided to play it safe and 
ignore Feynman. 


The decision to ignore Feynman’s analysis was made in September, but by the 
following spring we were up against a wall. The chips we had designed were slightly 
too big to manufacture, and the only way to solve the problem was to cut the 
number of buffers per chip back to five. Because Feynman’s equations claimed we 
could do this safely, his unconventional methods of analysis started looking better 
and better to us. We decided to go ahead and make the chips with the smaller 
number of buffers. Fortunately, Feynman was right. When we put together the 
chips, the machine worked. The first program run on the machine was John Horton 
Conway’s Game of Life, in April 1985. 


17.5 Cellular Automata 


The Game of Life is an example of a class of computations that interested Feyn- 
man: cellular automata. Like many physicists who had spent their lives going to 
successively lower levels of subatomic detail, Feynman often wondered what was 
at the bottom. One possible answer was a cellular automaton. The notion is that 
the space-time continuum might ultimately be discrete, and that the observed laws 
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of physics might simply be large-scale consequences of the average behavior of tiny 
cells. Each cell could be a simple automaton that obeys asmall set of rules and com- 
municates only with its nearest neighbors — like the points in the lattice calculation 
for quantum chromodynamics. If the universe in fact works this way, there should 
be testable consequences, such as an upper limit on the density of information per 
cubic meter of space. 


The notion of cellular automata goes back to John von Neumann and Slanislaw 
Ulam, whom Feynman had known at Los Alamos. Richard’s recent interest in 
the subject was aroused by his friends Ed Fredkin and Stephen Wolfram, both of 
whom were fascinated by cellular automata as models of physics. Feynman was 
always quick to point out to them that he considered their specific models “kooky,” 
but like the Connection Machine, he considered the subject crazy enough to put 
some energy into. There are many potential problems with cellular automata as a 
model of physical space and time — for example, finding a set of rules that gives 
relativistic invariance at the observable scale. One of the first problems is just 
making the physics rotationally invariant. The most obvious patterns of cellular 
automata, such as a fixed three-dimensional grid, have preferred directions along 
the grid axes. Is it possible to implement even Newtonian physics on a fixed lattice 
of automata? 


Feynman had a proposed solution to the anisotropy problem that he attempted 
(without success) to work out in detail. His notion was that the underlying au- 
tomata, rather than being connected in a regular lattice like a grid or a pattern of 
hexagons, might be randomly connected. Waves propagating through this medium 
would, on average, propagate at the same rate in every direction. 


Cellular automata started getting attention at Thinking Machines in 1984 when 
Wolfram suggested that we should use such automata not as a model of Nature, but 
as a practical approximation method for simulating physical systems. Specifically, 
we could use one processor to simulate each cell with neighbor-interaction rules cho- 
sen to model something useful, like fluid dynamics. Wolfram was at the Institute for 
Advanced Study in Princeton, but he was also spending time at Thinking Machines. 
For two-dimensional problems there was a neat solution to the anisotropy problem. 
It had recently been shown that a hexagonal lattice with a simple set of rules gives 
rise to isotropic behavior on the macroscopic scale. Wolfram did a simulation of 
this kind with hexagonal cells on the Connection Machine. It produced a beautiful 
movie of turbulent fluid flow in two dimensions. Watching the movie got all of us, 
especially Feynman, excited about physical simulation. We all started planning ad- 
ditions to the hardware, such as support for floating-point arithmetic, which would 
make it possible to perform and display a variety of simulations in real time. 
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17.6 Feynman the Explainer 


In the meantime, we were having a lot of trouble explaining to people what we were 
doing with cellular automata. Eyes tended to glaze over when we started talking 
about state transition diagrams and finite-state machines. Finally Feynman told us 
to explain it like this: 


We have noticed in Nature that the behavior of a fluid depends very little 
on the nature of the individual particles in that fluid. For example, the 
flow of sand is very similar to the flow of water or the flow of a pile of 
ball bearings. We have therefore taken advantage of this fact to invent 
a type of imaginary particle that is especially simple for us to simulate. 
This particle is a perfect ball bearing that can move at a single speed in 
one of six directions. The flow of these particles on a large enough scale 
is very similar to the flow of natural fluids. 


This was a typical Feynman explanation. On the one hand, it infuriated the experts 
who had worked on the problem because it did not even mention all of the clever 
problems that they had solved. On the other hand, it delighted the listeners because 
they could walk away with a real understanding of the calculation and how it was 
connccted to physical reality. 


We tried to take advantage of Richard’s talent for clarity by getting him to 
criticize the technical presentations we made in our product introductions. Before 
the commercial announcement of the first Connection Machine, CM-1, and all of 
our subsequent products, Richard would give a sentence-by-sentence critique of the 
planned presentation. “Don’t say ‘reflected acoustic wave.’ Say echo.” Or, “Forget 
all that ‘local minima’ stuff. Just say there’s a bubble caught in the crystal and you 
have to shake it out.” Nothing made him angrier than making something simple 
sound complicated. 


Getting Richard to give advice like that was sometimes tricky. He pretended 
not to like working on any problem that was outside his claimed area of expertise. 
Often, when one of us asked for him advice, he would gruffly refuse with “That’s 
not my department.” I could never figure out just what his department was, but it 
didn’t matter anyway, because he spent most of his time working on these “not my 
department” problems. Sometimes he really would give up, but more often than 
not he would come back a few days after his refusal and remark “I’ve been thinking 
about what you asked the other day and it seems to me...” This worked best if 
you were careful not to expect it. 


I do not mean to imply that Richard was hesitant to do the “dirty work.” In 
fact he was always volunteering for it. Many a visitor at Thinking Machines was 
shocked to see that we had a Nobel laureate soldering circuit boards or painting 
walls. But what Richard hated, or at least pretended to hate, was being asked to 
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give advice. So why were people always asking him for it? Because even when 
Richard didn’t understand, he always seemed to understand better than the rest of 
us. And whatever he understood, he could make others understand as well. Richard 
made people feel like children do when a grown-up first treats them as adults. He 
was never afraid to tell the truth, and however foolish your question was, he never 
made you feel like a fool. 


The charming side of Richard helped people forgive him for his less charming 
characteristics. For example, in many ways Richard was a sexist. When it came 
time for his daily bowl of soup, he would look around for the nearest “girl” and ask 
if she would bring it to him. It did not matter if she was the cook, an engineer or 
the president of the company. I once asked a female engineer who had just been 
a victim of this treatment if it bothered her. “Yes, it really annoys me” she said. 
“On the other hand, he’s the only one who ever explained quantum mechanics to 
me as if I could understand it.” That was the essence of Richard’s charm. 


17.7 A Kind of Game 


Richard worked at the company on and off for the next five years. Floating-point 
hardware was eventually added to the machine, and as the machine and its suc- 
cessors went into commercial production, they were being used more and more for 
the kind of numerical simulation problems Richard had pioneered with his quan- 
tum chromodynamics program. Richard’s interest shifted from the construction of 
the machine to its applications. As it turned out, building a big computer is a 
good excuse for talking with people who are working on some of the most exciting 
problems in science. We started working with physicists, astronomers, geologists, 
biologists, chemists — each of them trying to solve some problem that couldn’t have 
been solved before. Figuring out how to do such calculations on a parallel machine 
required understanding their details, which was exactly the kind of thing Richard 
loved to do. 


For Richard, figuring out these problems was a kind of game. He always started 
by asking very basic questions like “What is the simplest example?” or “How can 
you tell if the answer is right?” (He asked questions until he had reduced the 
problem to some essential puzzle he thought he could solve.) Then he would set 
to work, scribbling on a pad of paper and staring at the results. While he was in 
the middle of this kind of puzzle-solving, he was impossible to interrupt. “Don’t 
bug me. I’m busy” he would say without even looking up. Eventually he would 
either decide the problem was too hard (in which case he lost interest), or he would 
find a solution (in which case he spent the next day or two explaining it to anyone 
who would listen). In this way he helped work on problems in database searching, 
geophysical modeling, protein folding, image analyzing and the reading of insurance 
forms. 


The last project I worked on with Richard was in simulated evolution. I had writ- 
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ten a program that simulated the evolution of populations of sexually reproducing 
creatures over hundreds of thousands of generations. The results were surprising, 
in that the fitness of the population made progress in sudden leaps rather than by 
the expected steady improvement. The fossil record shows some evidence that real 
biological evolution might also exhibit such “punctuated equilibrium,” so Richard 
and I decided to look more closely at why it was happening. He was feeling ill by 
that time, so I went out and spent the week with him in Pasadena. We worked out 
a model of evolution of finite populations based on the Fokker-Planck equations. 
When I got back to Boston, I went to the library and discovered a book by Motoo 
Kimura on the subject. Much to my disappointment, all our “discoveries” were 
covered in the first few pages. When I called Richard and told him what I had 
found, he was elated. “Hey, we got it right!” he said. “Not bad for amateurs.” 


In retrospect I realize that in almost everything we worked on together, we were 
both amateurs. In digital physics, neural networks, even parallel computing, we 
never really knew what we were doing. But the things that we studied were so new 
that none of the others working in these fields knew exactly what they were doing 
either. It was amateurs who made the progress. 


17.8 Telling the Good Stuff You Know 


Actually, I doubt that it was “progress” that most interested Richard. He was al- 
ways searching for patterns, for connections, for a new way of looking at something, 
but I suspect his motivation was not so much to understand the world, as it was to 
find new ideas to explain. The act of discovery was not complete for him until he 
had taught it to someone else. 


I remember a conversation we had a year or so before his death, walking in 
the hills above Pasadena. We were exploring an unfamiliar trail, and Richard, 
recovering from a major operation for his cancer, was walking more slowly than 
usual. He was telling a long and funny story about how he had been reading up on 
his disease and surprising his doctors by predicting their diagnosis and his chances 
of survival. I was hearing for the first time how far his cancer had progressed, 
so the jokes did not seem so funny. He must have noticed my mood, because he 
suddenly stopped the story and asked, “Hey, what’s the matter?” I hesitated. “I’m 
sad because you’re going to die.” “Yeah,” he sighed, “that bugs me sometimes too. 
But not so much as you think.” And after a few more steps, “When you get as old 
as I am, you start to realize that you’ve told most of the good stuff you know to 
other people anyway.” 


We walked along in silence for a few minutes. Then we came to a place where 
another trail crossed ours and Richard stopped to look around at the surroundings. 
Suddenly a grin lit up his face. “Hey,” he said, all trace of sadness forgotten, “I bet 
I can show you a better way home.” And so he did. 
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CRYSTALLINE COMPUTATION 


Norman Margolus 


Abstract 


Discrete lattice systems have had a long and productive history in physics. Ex- 
amples range from exact theoretical models studied in statistical mechanics to ap- 
proximate numerical treatments of continuum models. There has, however, been 
relatively little attention paid to exact lattice models which obey an invertible dy- 
namics: From any state of the dynamical system you can infer the previous state. 
This kind of microscopic reversibility is an important property of all microscopic 
physical dynamics. Invertible lattice systems become even more physically realis- 
tic if we impose locality of interaction and exact conservation laws. In fact, some 
invertible and momentum conserving lattice dynamics—in which discrete particles 
hop between neighboring lattice sites at discrete times—accurately reproduce hy- 
drodynamics in the macroscopic limit. 


These kinds of discrete systems not only provide an intriguing information- 
dynamics approach to modeling macroscopic physics, but they may also be supremely 
practical. Exactly the same properties that make these models physically realistic 
also make them efficiently realizable. Algorithms that incorporate constraints such 
as locality of interaction and invertibility can be run on microscopic physical hard- 
ware that shares these constraints. Such hardware can, in principle, achieve a higher 
density and rate of computation than any other kind of computer. 


Thus it is interesting to construct discrete lattice dynamics which are more 
physics-like both in order to capture more of the richness of physical dynamics in 
informational models, and in order to improve our ability to harness physics for 
computation. In this chapter, we discuss techniques for bringing discrete lattice 
dynamics closer to physics, and some of the interesting consequences of doing so. 


18.1 Introduction 


In 1981, Richard Feynman gave a talk at a conference hosted by the MIT Informa- 
tion Mechanics Group. This talk was entitled “Simulating Physics with Comput- 
ers,” and is reproduced in this volume. 


In this talk Feynman asked whether it is possible that, at some extremely micro- 
scopic scale, Nature may operate exactly like discrete computer-logic. In particular, 
he discussed whether crystalline arrays of logic called Cellular Automata (CA) might 
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be able to simulate our known laws of physics in a direct fashion. This question 
had been the subject of long and heated debates between him and his good friend 
Edward Fredkin (the head of the MIT Group) who has long maintained that some 
sort of discrete classical-information model will eventually replace continuous dif- 
ferential equations as the mathematical machinery used for describing fundamental 
physical dynamics [31, 33]. 


For classical physics, Feynman could see no fundamental impediment to a very 
direct CA simulation. For quantum physics, he saw serious difficulties. In addi- 
tion to discussing well known issues having to do with hidden variables and non- 
separability, Feynman brought up a new issue: Simulation efficiency. He pointed 
out that, as far as we know, the only general way to simulate a lattice of quan- 
tum spins on an ordinary computer takes an exponentially greater number of bits 
than the number of spins. This kind of inefficiency, if unavoidable, would make it 
impossible to have a CA simulation of quantum physics in a very direct manner. 


Of course the enormous calculation needed to simulate a spin system on an or- 
dinary computer gives us the result of not just a single experiment on the system, 
but instead approximates the complete statistical distribution of results for an in- 
finite number of repetitions of the experiment. Feynman made the suggestion that 
it might be more efficient to use one quantum system to simulate another. One 
could imagine building a new kind of computer, a quantum spin computer, that 
was able to mimic the quantum dynamics of any spin system using about the same 
number of spins as the original system. Each simulation on the quantum computer 
would then act statistically like a single experiment on the original spin system. 
This observation that a quantum computer could do some things easily that we 
don’t know how to do efficiently classically, stimulated others to look for and find 
algorithms for quantum computers that are much faster than any currently known 
classical equivalents [37, 80]. In fact, if we restrict our classical hardware to per- 
form the “same kind” of computation as the quantum hardware—rather than to 
just solve the same problem—then we can actually prove that some quantum com- 
putations are faster. These fast quantum computations present further challenges 
to hypothetical classical-information models of quantum physics [44]. 


Despite such difficulties, Feynman did not rule out the possibility that some 
more subtle approach to the efficient classical computational modeling of physics 
might yet succeed. He found something very tantalizing about the relationship 
between classical information and quantum mechanics, and about the fact that 
in some ways quantum mechanics seems much more suited to being economically 
simulated with bits than classical mechanics: Unlike a continuous classical system, 
the entropy of a quantum system is finite. The informational economy of quantum 
systems that Feynman alluded to has of course long been exploited in statistical 
mechanics, where classical bits are sometimes used to provide finite combinatorial 
models that reproduce some of the macroscopic equilibrium properties of quantum 
systems [45, 46]. It is natural then to ask how much of the macroscopic dynamical 


CRYSTALLINE COMPUTATION 269 


behavior of physical systems can also be captured with simple classical information 
models. This is an interesting question even if your objective is not to revolutionize 
quantum physics: We can improve our understanding of Nature by making simple 
discrete models of phenomena. 


My own interest in CA modeling of physics stems from exactly this desire to 
try to understand Nature better by capturing aspects of it in exact informational 
models. This kind of modeling in some ways resembles numerical computation of 
differential equation models, where at each site in a spatial lattice we perform com- 
putations that involve data coming from neighboring lattice sites. In CA modeling, 
however, the conceptual model is not a continuous dynamics which can only be 
approximated on a computer, but is instead a finite logical dynamics that can be 
simulated exactly on a digital computer, without roundoff or truncation errors. Ev- 
ery CA simulation is an exact digital integration of the discrete equations of motion, 
over whatever length of time is desired. Conservations can be exact, invertibility 
of the dynamics can be exact, and discrete symmetries can be exact. Continuous 
behavior, on the other hand, can only emerge in a large-scale average sense—in 
the macroscopic limit. CA models have been developed in which realistic classical 
physics behavior is recovered in this limit [18, 79]. 


Physics-like CA models are of more than conceptual and pedagogical interest. 
Exactly the same general constraints that we impose on our CA systems to make 
them more like physics also make them more efficiently realizable as physical devices. 
CA hardware that matches the structure and constraints of microscopic physical 
dynamics can in principle be more efficient than any other kind of computer: It 
can perform more logic operations in less space and less time and with less energy 
dissipation (27, 94]. It is also scalable: A crystalline array of processing elements can 
be indefinitely extended. Finally, this kind of uniform computer is simpler to design, 
control, build and test than a more randomly structured machine. The prospect 
of efficient large-scale CA hardware provides a practical impetus for studying CA 
models. 


18.2 Modeling dynamics with classical spins 


From the point of view of a physicist, a CA model is a fully discrete classical field 
theory. Space is discrete, time is discrete, and the state at each discrete lattice point 
has only a finite number of possible discrete values. The most essential property of 
CA’s is that they emulate the spatial locality of physical law: The state at a given 
lattice site depends only upon the previous state at nearby neighboring sites. You 
can think of a CA computation as a regular spacetime crystal of processing events: 
A regular pattern of communication and logic events that is repeated in space and 
in time. Of course it is only the structure of the computer that is regular, not 
the patterns of data that evolve within it! These patterns can become arbitrarily 
complicated. 
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Discrete lattice models have been used in statistical mechanics since the 1920’s 
[45, 46]. In such models, a finite set of distinct quantum states is replaced by a 
finite set of distinct classical states. Consider, for example, a hypothetical quantum 
system consisting of n spin-5 particles arranged on a lattice, interacting locally. The 
spin behavior of such a system can be fully described in terms of 2” distinct (mutu- 
ally orthogonal) quantum states. The Ising model accurately reproduces essential 
aspects of phase-change behavior in such a system using n classical bits—which give 
us 2” distinct classical states. In the Ising model, at each site in our lattice we put 
a classical spin: A particle that can be in one of two classical states. We define 
bond energies between neighboring spins: We might say, for example, that two ad- 
jacent spins that are parallel (i.e., are in the same state) have a bond energy of e=, 
while two antiparallel (not same) neighbors have energy ez. This gives us a classi- 
cal system which has many possible states, each of which has an energy associated 
with it. In calculating the equilibrium properties of this system, we simply assume 
that the dynamics is complicated enough that all states with the same energy as we 
started with will appear with equal probability. Thus we ignore the actual quantum 
dynamics of the original spin system, and instead substitute an energy-conserving 
random pseudo-dynamics. 


We could equally well substitute any classical dynamics that has a sufficiently 
complicated evolution. We will consider one simple CA model that has been suc- 
cessfully used in this manner [19, 42, 74, 98]. We assume that we are dealing with 
an isolated spin system, not in contact with any heat bath, so that total energy 
must be exactly conserved. We will also impose the realistic constraint that the mi- 
croscopic dynamics of an isolated physical system must be exactly invertible: There 
must always be enough information in the current state to recover any previous 
state. This constraint helps to ensure that a deterministic dynamics explores its 
available state-space thoroughly, and so can be analyzed statistically—this issue is 
discussed in Section 18.4. 


We can construct a simple CA that has these properties, in which the next value 
of the spin at each site on a 2D square lattice only depends upon the current values 
of its four nearest neighbors. The rule is very simple: A given spin changes state 
if and only if this doesn’t change the total energy associated with its bonds to its 
four nearest neighbors. Equivalently, a given spin (bit) is flipped (complemented) if 
exactly two of its four neighbors are zero’s, and two are one’s. This doesn’t change 
its total bond energy: Both before and after the flip, it will be parallel to half of its 
neighbors (contributing 2e_ to the total), and antiparallel to the rest (contributing 


2€4). 


The rule as stated above would be fine if we updated just one spin on the lattice 
at a time, but we would like to update the lattice in parallel. To make this work, 
we will adopt a checkerboard updating scheme: We imagine that our lattice is a 
giant black and white checkerboard, and we alternately hold the bits at all of the 
black sites fixed while we update all of the white ones, and then hold the white 
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Fig. 18.1. An Ising CA (a) A state that evolved from a random pattern of 90% 0’s and 
10% 1’s. (b) Wave on the boundary between a domain of all 1’s, and a domain of all 0’s. 
(c) Closeup of a portion of the boundary. 


sublattice fixed while updating the black. In this way, the neighbors of a spin that 
is changed are not also simultaneously changed, and so our logic about conserving 
energy remains valid. 


Now how can we add invertibility? We already have! If we apply our rule to 
the same checkerboard sublattice twice in a row, then each spin is either flipped 
twice or not at all—the net effect is no change. Thus the most recent step in our 
time evolution can always be undone simply by applying the rule a second time to 
the appropriate sublattice, and we can recover any earlier state by undoing enough 
steps. 


This example demonstrates that we can simultaneously capture several basic 
aspects of physics in an exact digital model. First of all, the finite-state character 
of a quantum spin system is captured by using classical bits. Next, spatial locality 
is captured by making the rule at each lattice site depend only on nearby neigh- 
bors. Finally, both energy conservation and invertibility are captured by splitting 
the updating process into two phases, and alternately looking at one half of the 
bits while changing the other half. By making only changes that conserve a bond 
energy locally, we conserve energy globally. By making the change at each site be a 
permutation operation that depends only upon unchanged neighbor information, we 
can always go backwards by taking the same neighbor information and performing 
the inverse permutation. 


Figure 18.1a shows the state of the Ising CA after 100,000 steps of time evolution 
on a 512x512 lattice, when started from a randomly generated configuration of site 
values that consisted of 10% 1’s and 90% 0’s. This is an equilibrium configuration: 
If we compare this to the configuration after 100,000,000 steps, the picture looks 
qualitatively the same. This equilibrium configuration is divided about equally 
between large domains that are mostly 0’s, and large domains that are mostly 1’s. 
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Fig. 18.2. Ising-like CA’s in 1D and 3D. (a) Time history of Bennett’s 1D CA. (b) A 3D 
Ising CA cooled with a heatbath. 


Since bond energy is conserved, the total length of boundary between regions of 0’s 
and regions of 1’s must be unchanged from that in the initial configuration—the 
numbers of 0’s and 1’s are not themselves conserved. 


This CA has some surprising behavior when started from a more ordered initial 
state: It supports the continuum wave equation in an exact fashion. Figure 18.1b 
illustrates a wave on the boundary between two pure domains (all 0’s, or all 1’s). If 
we hold the values at the edges of the lattice fixed, then we find that the boundary 
shown behaves like a standing wave, oscillating in a harmonic fashion that repeats 
forever without any damping. In fact, it is easy to show that any waveform that we 
set up along this diagonal boundary—as long as it isn’t too steep—exactly obeys the 
wave equation (cf. [43]). Tosee this, notice (Figure 18.1c) that the boundary between 
the two domains consists of a sequence of vertical and horizontal line-segments each 
the height or width of one site. If we number these segments sequentially along the 
boundary, then it is easy to verify that, at each update of the lattice, all of the even- 
numbered segments move one position along the boundary in one direction, while 
all of the odd-numbered segments move one position in the opposite direction. Thus 
the shape of the boundary is exactly the superposition of two discrete waveforms 
moving in opposite directions. 


Similar techniques to those used in the Ising CA give a variety of related CA 
models {19, 21, 62, 90]. For example, in Figure 18.2a we show the time-history 
of a 1D rule invented by Charles Bennett that has exactly the same bond-energy 
conservation that we’ve just seen [74]. In Bennett’s CA, instead of 1-bit at each site 
we put 2-bits, which we'll call A; and B;. The A’s and B’s will play the roles of the 
two sublattices in the Ising CA. We first update all of the A’s in parallel, holding 


CRYSTALLINE COMPUTATION 273 


the B’s fixed, and then vice versa. For each Aj, it’s neighbors along the 1D chain 
will be the two B’s on either side of it: B;-2, Bj-1, Biz, and Bj+2. Our rule is the 
same as before: We complement an 4A; if exactly half of its four neighbors are 1’s, 
and half are 0’s. Once we have updated all of the A’s, then we update the B’s in 
the same manner, using the A’s as neighbors. If we consider that there is a bond 
between each “spin” and its four neighbors, then we are again flipping the spin only 
if it doesn’t change the total bond energy. If we update the same sublattice twice 
in a row, the net effect is no change: The rule is invertible, exactly like the Ising 
CA. 


In the figure, our 1D lattice is 512 sites wide, with periodic boundaries (joined 
at the edges). We started the system with all sites empty except for a patch of 
randomly set bits in sites near the center. Time advances upward in the figure, and 
we show a segment of the evolution after about 100,000 steps. Rather than show 
the domains directly, we show all bonds that join antiparallel spins—the number of 
such “domain boundary” bonds is not changed by the dynamics. Note the variety 
of “particle” sizes and speeds. 


In Figure 18.2b, we show a 3D Ising dynamics with a heat bath. Here the 
rule is an invertible 3D checkerboard Ising CA similar to our 2D version, except 
that at every site in our 3D lattice we have added a few extra heatbath bits. The 
heatbath bits at each site record a binary number that is interpreted as an energy. 
Now our invertible rule is again “flip whenever it is energetically allowed.” As long 
as the heatbath energy at a given site is not too near its maximum, then a spin 
flip that would lower the bond energy is allowed, because we can put the energy 
difference into the heatbath. Similarly with transitions that would raise the bond 
energy. This heatbath-CA technique is due to Michael Creutz [19]. He thought of 
the bond energy as being potential energy, and the heatbath energy as being the 
associated kinetic energy. This heatbath CA is perfectly invertible, since applying 
the dynamics twice to the same sublattice leaves both the spins and the heatbath 
unchanged. 


By adjusting the energy in the heatbath portion of this 3D CA, we can directly 
control the temperature of our system. We simply stop the simulation for a moment 
while we reach into our system and reset the heatbath values—without changing the 
spin values. As we cool the system in this way, energy will be extracted from bonds, 
and so if (for example) ex > e=, then there will be fewer domain boundaries—the 
domains will grow larger. The system shown has been cooled in this manner, and 
we render the interface between the up and down spins. 


Figure 18.3 shows another Ising-like CA defined on a 3D cubic lattice. As in 
our 2D Ising CA, we have only one bit of state at each lattice site. Each site has 
a bond with each of its six nearest neighbors, and we perform a 3D checkerboard 
updating. This time our rule is, “flip a given spin if its six neighbors all have the 
same value: Six 0’s or six 1’s.” We'll call this the “Same” rule. If we label half of the 


274 NORMAN H. MARGOLUS 


Fig. 18.3. An Ising-like 3D CA. (a) A macroscopic equilibrium configuration. (b) The 
same configuration, with the front half of the ball removed. 


bonds attached to each site as “antiferromagnetic” (i.e., the energy values associated 
with parallel and antiparallel spins are interchanged for these labeled bonds), then 
this rule again conserves the total bond energy. Notice, though, that there are 
many different ways of labeling half of the bonds, and each way corresponds to 
a different additively conserved energy. We need to use several of these energies 
simultaneously if we want to express the Same rule as “flip whenever permitted by 
energy conservation.” 


The system in Figure 18.3a is 512512512 and was started from an empty 
space (all 0’s) with a small random block of spin values in the center. After about 
5000 steps of time evolution, this invertible system settles into the ball shown, 
which then doesn’t change further macroscopically. Microscopically it must keep 
changing—otherwise if we ran the system backwards it couldn’t tell when to start 
changing again and unform the ball. The local density of 1’s defines the surface 
that is being rendered. In Figure 18.3b we remove the front half of the ball to show 
its interior structure. The analogous rule in 2D does not form stable “balls.” 


It is easy to define other energy-conserving invertible Ising-like CA’s. We could, 
for example, take any model that has a bond energy defined, find a sublattice of 
sites that aren’t directly connected to each other by bonds, and update those sites 
in an invertible and energy conserving manner, holding their neighbors fixed. By 
running through a sequence of such sublattices, we would eventually update all sites. 
We could also make CA models with the same energy conservations with just two 
sublattices, by using the technique illustrated in Bennett’s CA. Simply duplicate 
the state at each site in the original model, calling one copy A;, and the other B;. 
If the A’s are only bonded to the B’s and vice versa, then we can update half of 
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our system in parallel, while holding all neighbors that they depend on fixed. Of 
course we can construct additional invertible energy-conserving rules by taking any 
of these examples and forbidding some changes that are energetically allowed. 


18.3. Simple CA’s with arbitrarily complex behavior 


When the Ising model was first conceived in the 1920’s, it was not thought of as a 
computer model: There were no electronic computers yet! It was only decades later 
that it and other discrete lattice models could begin to be investigated on computers. 
One of the first to think about such models was John von Neumann (16, 97]. He 
was particularly interested in using computer ideas to construct a mechanical model 
that would capture certain aspects of biology that are essential for reproduction and 
evolution. What he constructed was a discrete world in which one could arrange 
patterns of signals that act much like the logic circuitry in a computer. Just as 
computer programs can be arbitrarily complex, so too could the animated patterns 
in his CA world. Digital “creatures” in his digital universe reproduced themselves 
by following a digital program. This work anticipated the discovery that biological 
life also uses a digital program (DNA) in order to reproduce itself. 


As we will see below, the level of complexity needed in a CA rule in order to 
simulate arbitrary patterns of logic and hence universal computation is quite low. 
In physics, this same possibility of building arbitrarily complicated mechanisms out 
of a fixed set of components seems to be an essential property of the evolution that 
built us. Is this the only essential property of evolution? 


In a paradoxical sense, computation universality gives us so much that it really 
gives us very little. Once we have computation universality, we have a CA that 
is just as powerful as any conventional computer! By being able to simulate the 
logic circuitry of any computer, given enough time and space, any universal CA can 
compute exactly the same things as any ordinary computer. It can play chess. It 
can simulate quantum mechanics. It can even perform a simulation of a Pentium 
Processor running Tom Ray’s Tierra evolutionary-simulation program [77], and thus 
we know that it is capable of exhibiting Darwinian evolution. But if we don’t put 
in such an unlikely initial state by hand, is evolution of interesting complexity 
something that we are ever likely to see? Is it a robust property of the dynamics? 


Nature has computation universality along with locality, exact conservations and 
many other constraints. Which of these constraints are important for promoting 
evolution, and whether it is possible to capture all of the important constraints si- 
multaneously in a CA model, are both interesting questions. Here we will examine 
a well-known CA rule that is universal, and then discuss some physical constraints 
that it lacks that might make it a better candidate as a model for Darwinian evo- 
lution. 


In Figure 18.4 we illustrate Conway’s Game of Life, probably the most widely 
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Fig. 18.4. Conway’s non-invertible “Game of Life” CA (128x128 closeups taken from a 
2K x2K space). (a) One thousand steps into an evolution started from a random config- 
uration of 1’s and 0’s. (b) The same region after 16 thousand steps—the evolution has 
settled down into small uncoupled repeating patterns. (c) A configuration that started 
with two “glider guns.” 


known CA rule [7]. This is a CA that involves a 2D square lattice with one bit at 
each lattice site, and a rule for updating each site that depends on the total number 
of 1’s present in its eight nearest neighboring sites. If the total of its neighbors is 3, 
a given site becomes a 1, if the total is 2, the site remains unchanged; in all other 
cases the site becomes a 0. This rule is applied to all sites simultaneously. 


The Life rule is clearly non-invertible since, for example, an isolated 1 surrounded 
by 0’s turns into a 0: You cannot then tell from the resulting configuration of site 
values whether that site was a 0 or a 1 in the previous configuration. 


If you fill your computer screen with a random pattern of 0 and 1 pixels, and run 
the Life dynamics on it at video rates, then you see a lively churning and boiling 
pattern of activity, dying down in places to a scattering of small-period oscillating 
structures, and then being reignited from adjacent areas (Figure 18.4a). If you 
speed up your simulation to a few hundred frames per second, then typically after 
less than a minute for a 2Kx2K system all of the interesting activity has died 
out, and the pattern has settled down into a set of isolated small-period oscillators 
(Figure 18.4b). 


If you watch the initial activity closely, however, and pick out some of the inter- 
esting dynamical structures that arise, you can “build” configurations containing 
constructs such as the ones in Figure 18.4c. These are called glider guns. When 
the Life dynamics is applied to a glider gun, at regular intervals the gun spits out a 
small pattern that then goes through a short cycle of shapes, with the same shape 
reappearing every few steps in a shifted position. These gliders are the smallest 
moving objects in the Life universe. By putting together such constructs, one can 
show how to build arbitrary logic circuits, using sequences of gliders as the signals 
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that travel around and interact [7, 36}. 


This then is our first example of a universal CA rule. Many other non-invertible 
universal CA rules are known—the simplest is due to Roger Banks [90]. All of these 
can support arbitrary complexity if you rig up a special enough initial state. Life 
is notable because it spontaneously develops interesting complexity starting from 
most initial states. Small structures that do something recognizable occasionally 
appear briefly, before being sucked back into the digital froth. 


18.4 Invertible CA’s are more interesting 


One problem with Conway’s Life as a model of evolution is that it lasts for such 
a short time when started from generic initial conditions. For a space of 2Kx2K 
bits, there are 24:194:304 distinct possible configurations, and this rule typically goes 
through fewer than 2!4 of them before repeating a configuration and entering a 
cycle. This doesn’t allow much time for the evolution of complexity! Furthermore, 
useful computing structures in Life are very fragile: Gliders typically vanish as soon 
as they touch anything. 


The short Life-time problem can be attributed largely to the non-invertible 
nature of the Life rule—invertible rules do not behave like this. We typically have 
no idea just how long the cycle times of our invertible CA’s actually are, because we 
have never seen them cycle, except from very special initial states or on very tiny 
spaces. The reason that invertible CA’s have such long cycle-times is actually the 
same as the reason that essentially all invertible information dynamics have long 
cycles: An invertible dynamics cannot repeat any state it has gone through until it 
first repeats the state it started in. In other words, if we run an invertible rule for 
a while, we know what the unique predecessor of every state we have already seen 
is, except for the first state—its predecessor is still coming up! Thus an invertible 
system is forced to keep sampling distinct states until it stumbles onto its initial 
state. Since there is nothing forcing it toward that state as it explores its state 
space, the cycle of distinct states is typically enormously long: If our invertible CA 
really did sample states at random without repetition, it would typically have to 
go through about half of all possible states before chancing upon its initial state 
[91]. A non-invertible system doesn’t have this constraint, and can re-enter its past 
trajectory at any point [49]. The moral here is that if you want to make a discrete 
world that lasts long enough to do interesting things, it is a good idea to make it 
invertible. As a bonus, a more thorough exploration of the available state-space 
tends to make a system more amenable to statistical mechanical analysis. 


To make it easier to capture physical properties in CA’s, we will use a technique 
called partitioning, which was developed specifically for this purpose (60, 90]. This 
technique is closely related to the sublattice technique introduced in Section 18.2 
(62, 93]. The idea of partitioning is to divide up all of the bits in our CA system 
into disjoint local groupings—each bit is part of only one group. Then we update 
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Fig. 18.5. The invertible “Critters” CA. (a) The solid and dotted blockings are used 
alternately. (b) The Critters rule. 


all of the groups independently, before changing the groupings and repeating the 
process—changing the groupings allows information to propagate between groups. 
If the updating of each group conserves the number of 1’s, for example, then so 
does the global dynamics. If the updating applied independently to each group 
is an invertible function, then the global dynamics is also invertible. Since all 
invertible CA’s can be reexpressed isomorphically in a partitioning format—where 
conservations and invertibility are manifest—this is a particularly convenient format 
to use for our models [47, 48, 93]. 


Our first example of a partitioned CA is called “Critters.” This is a universal 
invertible CA that evolves interesting complexity. The Critters rule uses a 2x2 
block partition on a 2D square lattice. In Figure 18.5a we show an 8x8 region 
of the lattice—each square represents a lattice site that can hold a 0 or a 1. The 
solid lines show the grouping of the bits into 2x2 blocks that is used on the even 
time-steps, the dotted lines show the odd-time grouping. The Critters rule is shown 
in Figure 18.5b. This same rule is used for both the even-time grouping and the 
odd-time grouping. All possible sets of initial values for the four bits in a 2x2 
block are shown on the left, the corresponding results are shown on the right. The 
rule is rotationally symmetric, so not all cases need to be shown explicitly: Each of 
the four discrete rotations of a block that is shown on the left turns into the same 
rotation of the corresponding result-block shown on the right. 


Notice that each of the 16 possible initial states of a block is turned into a distinct 
result state. Thus the Critters rule is invertible. Notice also that the number of 
1’s in the initial state of each block is, in all cases, equal to the number of 0’s in 
the result. Thus this property is true for each update of the entire lattice. If we 
call 1’s particles on even steps, and call 0’s particles on odd steps, then particles 
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Fig. 18.6. A Critters simulation. (a) The initial state of the full 2K x2K lattice. (b) The 
state after 1,000,000 steps. (c) A closeup of a region on the right. 


are conserved by this dynamics. Notice that the Critters rule also conserves the 
parity (sum mod 2) along each diagonal of each block, which leads to conservation 
of parity along every second diagonal line running across the entire space. 


It is not interesting to run an invertible rule such as Critters starting from a 
completely random initial state, as we did in the case of Life. This is because the 
vast majority of all possible states are random-looking and so, by a simple counting 
argument, almost all of them have to turn into other random-looking states. To 
see this, note that any given number of steps of an invertible dynamics must turn 
each distinct initial state into a distinct final state. Since the set of states with 
recognizable structure is such a tiny subset of the set of all possible states, almost 
every random-looking state must turn into another random-looking state. 


Thus instead of a random state, a “generic” initial state for aninvertible CA will 
be some easily generated “low-entropy” state—we saw several examples of invertible 
evolutions from such states in Section 18.2. For the Critters CA, we show a sample 
simulation started from an empty 2K x2K lattice with a randomly filled 512x512 
block of 0’s and 1’s in the middle (Figure 18.6a). In Figure 18.6b we see the state 
after 1,000,000 updates of the entire space. In this simulation, opposite edges of 
the space have been connected together (periodic boundaries). Figure 18.6c shows 
a Closeup of a region near the right edge of the space: All of the structure present 
has arisen from collisions of small moving objects that emerged from the central 
region. In analogy to Life, we will call these small moving objects gliders. You can 
see several of these gliders in various phases of their motion in the closeup: They are 
the compact symmetrical structures composed of four particles, with two particles 
adjacent to each other, and two slightly separated (see also Figure 18.10). In the 
Critters dynamics, a glider goes through a cycle of four configurations each time it 
shifts by two positions. 
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Unlike the gliders in Life, Critters gliders are quite robust. Consider, for exam- 
ple, what happens when two of these gliders collide in an empty region. At first 
they form a blob of eight particles that goes through some pattern of contortions. If 
nothing hits this blob for a while, we always see at least one of the gliders emerge. 
This property arises from the combination of conservation and invertibility: We can 
prove, from invertibility, that the blob must break up, but since the only moving 
objects we can make with eight particles are one or two gliders, then that’s what 
must come out. To see that the blob must break up, we can suppose the opposite. 
The particles that make up the blob can only get so far apart without the blob 
breaking up, and so there are only a finite number of possible configurations of the 
blob. The blob cannot repeat any configuration (and hence enter a cycle of states) 
because of invertibility: A local cycle of states would be stable going back in time 
as well as forward, but we know that the blob has to break up if we run backwards 
past the collision that formed it. Since the blob runs out of distinct configurations 
and cannot repeat, it must break up. At least one glider must come out. If the 
collision that formed the blob was rotationally symmetric, then both gliders must 
come out, since the dynamics is also rotationally symmetric. The robustness of 
particles that we saw in Figure 18.2a arises in a similar manner. 


The Critters rule is fascinating to watch because of the complicated structures 
that form, with swarms of gliders bouncing around within them and slowly altering 
them. Sometimes, for example, a glider will hit a little flap left from a previous glider 
collision, flipping it from one diagonal orientation to another. This will affect what 
another glider does when it subsequently hits that flap. Gliders will hit obstacles 
and turn corners, sometimes going one way, sometimes another, depending on the 
details of the collisions. The pattern must gradually change, because the system 
as a whole cannot repeat. After a little observation it is clear that there are many 
ways to build arbitrary logic out of the Critters rule—one simple way is sketched 
in the next section in order to demonstrate this rule’s universality. 


Started from an ordered state, the Critters CA accumulates disorder for the same 
reason that a neat room accumulates disorder: Most changes make it messier. As 
we have already noted, in an invertible dynamics, a simple counting argument tells 
us that most messy states don’t get neater. Localized patterns of structured activity 
that may arise within this CA must deal with an increasingly messy environment. 
No such structure in an invertible world can take in inputs that have statistical 
properties that are unpredictable by it, and produce outputs that are less messy, 
again because of our counting argument. Thus its fair to call a measure of the 
messiness of an invertible CA world the total entropy. We can think of this total 
entropy as being approximated by the size of the file we would get if we took 
the whole array of bits that fills our lattice and used some standard compression 
algorithm on it. 


It is possible to construct invertible CA’s in which a simple initial state turns 
into a completely random looking mess very quickly. While it is still true that 
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Fig. 18.7. Fredkin’s Billiard Ball Model. (a) Balls heading toward a collision. (b) Paths 
taken in collision are displaced from straight paths. 


this invertible CA will probably take forever to cycle, it has found another way 
to end its interesting activity quickly—what we might call a rapid heat death. Of 
course heat death is the inevitable fate of any CA evolution that has a long enough 
cycle: Since the vast majority of states are random-looking, very long cycles must 
consist mostly of such states. We can, however, try to put off the inevitable. In 
the Critters CA, symmetries and conservation laws act as constraints on the rate of 
increase of entropy, and so make the interesting low-entropy phase of the dynamics 
last much longer. It would be interesting to try to capture within CA dynamics 
other mechanisms that occur in real physics that contribute to metastability and 
hence delay the heat death. 


18.5 A bridge to the continuum 


Historically, the partitioning technique used in the previous section was first de- 
veloped [60] for use in the construction of a very simple universal invertible CA 
modeled after Fredkin’s Billiard Ball Model (BBM) of computation [30].1 The 
BBM is a beautiful example of a continuum physical system that is turned into a 
digital system simply by constraining its initial conditions and the times at which 
we observe it. This makes it a wonderful bridge between the tools and concepts of 
continuum mechanics, and the world of exact discrete information dynamics. This 
model is discussed in [26], but we will review it very briefly here. 


In Figure 18.7a we show two finite-diameter billiard balls heading toward a 
collision, both moving at the same speed. Their centers are initially located at 


1The first universal invertible CA was actually constructed by Toffoli [86], who showed how 
to take a universal 2D CA that was non-invertible, and add a third dimension that would keep a 
complete time history of the dynamics, thus rendering it invertible. 
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Fig. 18.8. A Billiard Ball Model CA. (a) The BBMCA rule. (b) A BBMCA circuit. 


integer coordinates on a Cartesian lattice—we will refer to these as lattice points. 
At regular intervals, the balls will be found at consecutive lattice points, until they 
collide. In Figure 18.7b we show a collision. The outer paths show the actual 
course that the balls take after a collision; the inner paths illustrate where each 
of the two balls would have gone if the other one wasn’t there to collide with it. 
Thus we see that a locus at which a collision might or might not happen performs 
logic: If the presence of a ball at a lattice point at an integer time is taken to 
represent a 1, and the absence a 0, then we get 1’s coming out on the outer paths 
only if balls at 4 AND B came in at the same time. The other output paths 
correspond to other logical functions of the inputs. It is easy to verify that this 
collision is a universal and invertible logic element (just reverse all velocities to run 
BBM circuits backwards). We also allow fixed mirrors in our model to help route 
ball-signals around the system—these are carefully placed so that the centers of 
balls are still always found at lattice points at integer times. 


In order to make a simple CA model of the BBM, we will represent finite di- 
ameter balls in our CA by spatially separated pairs of particles, one following the 
other—the leading edge of the ball followed by the trailing edge. When such a ball 
collides with something, the front-edge particle collides first, and then the influ- 
ence is communicated to the trailing edge. This kind of “no rigid bodies” approach 
to collisions is more consonant with the locality of interaction that we are trying 
to capture in CA’s than a larger-neighborhood model in which distant parts of a 
fixed-size ba]l can see what’s going on right away. 


Figure 18.8a shows the BBMCA rule. Like the Critters rule, this rule is rota- 
tionally symmetric and so, again, only one case out of every rotationally equivalent 
set is shown. Note that the rule conserves 1’s (particles), and that only two cases 
change. This is the complete rule that is applied alternately to the even and odd 
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Fig. 18.9. A BBMCA collision. We show succesive snapshots of a small area where a 
collision is happening. In the first image, the solid-blocks are about to be updated. The 
blocking alternates in successive images. 


2x2 blockings. Note that, much like the Ising CA, this rule is its own inverse: If 
we simply apply the update rule to the same blocking twice in a row, the net effect 
is no change.” 


Figure 18.9 shows a BBMCA collision between two minimum-size balls—the 
gap between the two particles that make up the front and back of the ball can 
be any odd number of empty sites. Until the balls get close together the particles 
that form them all propagate independently: A single 1 in one corner of a block 
moves to the opposite corner. When we change the blocking, the particle again 
finds itself alone in a block in the same corner it started in, and again moves in 
the same direction. When two leading-edge particles find themselves in the same 
block, the collision begins. These particles are stuck for one step—this case doesn’t 
change. Meanwhile the trailing edge particles catch up, each colliding head-on with 
a leading-edge particle which was about to head back to meet it (if the gap had 
been wider). New particles come out at right angles to the original directions, due 
to the “two-on-a-diagonal” case of the rule, which switches diagonal. Now one of 
the particles from each head-on collision becomes the new leading edge particle; 
these are done with the collision and head away from the collision locus, once again 
propagating independently of the trailing particles. Meanwhile the two new trailing- 
edge particles are headed toward each other. They collide and are stuck for one 
step before reflecting back the way they came, each following along the path already 
taken by a leading edge particle. Each two-particle ball has been displaced from its 
original path. If the other two-particle ball hadn’t been there, it would have gone 
straight. 


Mirrors are built by placing square patterns of four particles straddling two 
adjacent blocks of the partition. It is easy to verify that such squares don’t change 
under this rule, even if you put them right next to each other. Single particles just 
bounce back from such mirrors. The collision of a two-particle ball with such a 
mirror looks just like the collision of two balls that we have already seen; we just 
replace one of the balls with a mirror whose surface lies along the axis of symmetry 


?The Ising CA is actually very closely related. It can be put into the same 2X2 block-partitioned 
format if we model bonds instead of sites [90]. 
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Fig. 18,10, A BBMCA-style collision of pairs of gliders in the Critters CA. The images 
shown are not consecutive states of the lattice, but are instead spaced in time to correspond 
(with a 45° rotation) to the images in the previous figure. 


of the two-ball collision. The remaining ball can’t tell the difference. For more 
details about the BBMCA, see [62, 90]. 


Figure 18.8b shows a BBMCA circuit, computing a permutation sequence. Be- 
cause of their long cycle times, invertible circuits tend to make good pseudo-random 
number generators, In fact, a perfect random number generator would go through 
all of its internal states before cycling, and so it would be perfectly invertible. It 
is also interesting to use the BBMCA to construct circuits that are more directly 
analogous to thermodynamic systems, since the constraint of invertibility means 
that it is impossible to design a BBMCA circuit that, acting on unpredictable sta- 
tistical inputs that it receives, can reduce the entropy of those data—for the reasons 
discussed in the previous section [4, 6, 62]. The BBMCA is simple enough that 
it provides a good theoretical model to use for other inquiries about connections 
between physics and computation. For example, one can use its dynamics as the 
basis of quantum spin models [63]. 


Using Figure 18.10, we sketch a simple demonstration that the Critters rule of 
the previous section is universal—we show that pairs of Critters-gliders suitably ar- 
ranged can act just like the “balls” in the BBMCA, which is universal. Figure 18.10 
shows a collision that is equivalent to that shown in Figure 18.9. We don’t show ev- 
ery step of the Critters time-evolution; instead we show the pairs of gliders at points 
corresponding to those in the collision of Figure 18.9. Mirrors can be implemented 
by two single Critters particles, one representing each end of the mirror. 


18.6 Discrete molecular dynamics 


Having constructed a CA version of billiard ball dynamics, it seems natural to try to 
construct CA’s that act morelike real gases [60, 89]. With enough particle directions 
and particle speeds, our discrete Molecular Dynamics (MD) should approximate a 
real gas. 


The BBMCA has just four particle directions and a single partiCle speed. We will 
make our first MD model by modifying the BBMCA rule. For simplicity, we won’t 
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Fig. 18.11. A simple four-direction lattice gas. (a) A momentum conserving invertible 
rule. (b) A 512x512 lattice filled randomly with particles, with a square block of ones’s 
in the center. (c) A round pressure wave spreads out from the center. 


i 
Le 
a 
He 
r 


A 


worry about modeling finite-diameter balls: Single 1’s will be our gas molecules. 
The simplest such rule would be, “During each 2x2 block update, each molecule 
ignores whatever else is in the block and simply moves to the opposite corner of 
its block.” Then, when we switch partitions, that molecule would again be back in 
the same kind of corner that it started in and so it would again move in the same 
direction—moving exactly like an isolated particle in the BBMCA. This simple rule 
gives us a non-interacting gas, with four directions and one speed. 


We would like to add a momentum conserving collision to this non-interacting 
gas rule. We can begin by defining what we mean by momentum. If we imagine 
that our discrete lattice dynamics is simply a succession of snapshots of a continuum 
system, as it was in the case of the BBM, then we automatically inherit definitions of 
energy and momentum from the continuous system. To add a momentum conserving 
collision to our simple four-direction gas, we should have two molecules that collide 
head-on come out at right angles. Figure 18.11a shows the non-interacting gas rule 
with one case modified to add this collision: When exactly two molecules appear 
on one diagonal of a block, they come out on the other diagonal. 


We would not expect such a simple model to behave very much like a real 
gas. In Figure 18.11b, we show a 512x512 2D space filled with a random pattern 
of 1’s and 0’s, with a square block in the center of the pattern that is all 1’s. 
Figure 18.11c shows this system after about 200 updates of the space: We see a 
round pressure wave. We were amazed when we first ran this simulation in the 
early 1980's [89]. How could we get such continuous looking behavior from such a 
simple model? This is the point at which we began to think that perhaps CA MD 
might be immediately practical for fluid modeling [59, 90]. Discrete lattice models 
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Fig. 18.12. A slightly modified gas CA. The vertical bar is a density wave traveling to the 
right, the circle is a region in which waves are slower. We see the wave reflect and refract. 


are well adapted to meshes of locally interconnected digital hardware, which can be 
very large and very fast if the models are simple. It turns out, though, that this 
particular model is too simple to simulate fluid flow—though it is useful for other 
purposes. This four-direction lattice gas automaton (LGA) is now commonly known 
as the HPP gas after its originators [41], who analyzed it about a decade before 
we rediscovered it. Their analysis showed that this four-velocity model doesn’t give 
rise to normal isotropic hydrodynamics. 


Notice that the HPP gas is perfectly invertible—like the BBMCA, its rule is its 
own inverse. Thus we can run our pressure wave backwards, getting back exactly 
to the square block we started from. This doesn’t contradict what we said earlier 
about entropy in invertible CA’s, since a messy state can always be cleaned up if 
you undo ezactly the sequence of actions that produced the mess. Normal forward 
time evolution doesn’t do this. 


Once we are able to model one macroscopic phenomenon, it is often obvious how 
to model other phenomena. Starting from a model with sound waves, we can make 
a model with reflection and refraction of such waves. In Figure 18.12 we show a 
simulation using a 2-bit variant of the HPP CA. Here we have added a bit to each 
site, and used it to mark a circular region of the space: One bit at each site is a 
gas bit, and the other bit is a mark bit. We now alternate the rule with time so 
that, for one complete even-time/odd-time update of the lattice we apply the HPP 
rule to the gas bits at all sites; then we do a complete even/odd update only of 
unmarked blocks, with all gas particles in blocks containing non-zero mark-bits left 
unchanged. This gives us two connected HPP systems, one running half as fast as 
the other. In particular, waves travel half as fast in the marked region, and so we 
get refraction of the wave that is incident from the left. Notice that the dynamics is 
still perfectly invertible, and that we can make our “lens” any shape we desire—it is 
a general feature of CA MD that wecan simply “draw” whatever shaped obstacles 
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and potentials we need in a simulation [83]. Related LGA techniques have been 
used for complex antenna simulations [81]. 


We can model many other phenomena by coupling 2x2 block rules. We can, 
for example, use the HPP gas or a finite-impact-parameter variant of it (the TM 
gas [59, 90], which has better momentum-mixing behavior) as a source of pseudo- 
randomness in a diffusion model. We start by again putting two bits at each lattice 
site—one bit will belong to the diffusing system, while the other belongs to the 
randomizing system. Let the four bits of the randomizing system in each block 
simply follow the HPP dynamics described above. Let the four bits of the diffusing 
system be rotated 90° clockwise or counterclockwise, depending on the parity of 
the number of 1’s in the four “random” bits. This results in a perfectly invertible 
diffusion in which no more than one diffusing particle can ever land at the same 
site [90]. Using this approach with enough bits at each site we can, for example, 
model the diffusion and interaction of different chemical species. 


The HPP CA was originally presented in a different format than we have used 
above [41]. Since this other format is in many cases very natural for discussing MD 
models, we will describe it here, and then relate it to a partitioned description. We 
start by putting four bits of state at each site of a square lattice. We will call the 
bits at the i site N;, 5;, Ej; and W;. The dynamics consists of alternating two 
steps: (1) move the data, and then (2) let the groups of bits that land at each 
site interact separately. The first step moves the data in sheets: If we think of 
the directions on our lattice as being North, South, East and West, then all of the 
N bits are moved one position North, all the S bits one position South, etc. Our 
interaction rule at each site combines the bits that came from different directions 
and sends them out in new directions. A state consisting of two 1’s (particles) that 
came in from opposite directions and two 0’s from the other directions, is changed 
into a state in which the 1’s and 0’s are interchanged—the particles come out at 
right angles to their original directions. In all other cases, particles come out in the 
same directions they came in. 


We can think of this as a particular kind of partitioning rule, where the four 
bits at each site are the groups, and we use the data-movement step to rearrange 
the bits mto new groups. Although in some ways this stte-partitioned description 
of the HPP gas is simpler, it also suffers from a slight defect. If we imagine, as we 
did in our discussion of the Ising model, that our lattice is a giant black and white 
checkerboard, then we notice that in one data movement step all of the bits that 
land at black squares came from white squares, and vice versa. No data that is 
currently on a black square will ever interact with data that is currently on a white 
square: We have two completely independent subsystems. The 2x2 block version 
of the HPP rule is isomorphic to just one of these subsystems, and so lets us avoid 
simulating two non-interacting systems. Of course we can also avoid this problem in 
the site-partitioned version with more complicated time-dependent shifts: We can 
always reexpress any partitioned CA as a site-partitioned CA. This fact has been 
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Fig. 18.13. Flow past a half-cylinder using a six-direction lattice gas on a triangular 
lattice. We simulate “smoke” streamers to visualize the flow. 


important in the design of our latest CA machines. 


The HPP lattice gas produces a nice round-looking sound wave, but doesn’t 
reproduce 2D hydrodynamics in the large-scale limit. We can clearly make a CA 
model with more speeds and directions by having molecules travel several lattice 
positions horizontally and/or vertically at each step—just add more particles at each 
site, and shift the different momentum fields appropriately during the movement 
step. With enough speeds and directions, it seems obvious that we can get the right 
macroscopic limit—this should be very much like a hard-sphere gas, which acts like a 
fluid. The fact that so many different fluids obey the same hydrodynamic equations 
also suggests that the details of the dynamics can’t matter very much, just the 
constraints such as momentum and particle conservation. 


So how simple a model can work? It was found [28] that we can recover macro- 
scopic 2D hydrodynamics from a model that is only slightly more complicated than 
the HPP gas. A single-speed model with six particles per site, moving in six direc- 
tions on a triangular lattice, will do. If all zero-net-momentum collisions cause the 
molecules at the collision site to scatter into a rotated configuration, and otherwise 
particles go straight, then in the low speed limit we recover isotropic macroscopic 
fluid dynamics. Figure 18.13 shows a simulation of a slightly more complicated 
six-direction lattice gas [12]. The simulation shown is 2K x1K, and we see vortex 
shedding in flow past a half-cylinder. The white streamers are actually a second 
gas (more bits per site!), inserted into the hydrodynamic gas as a kind of smoke 
used to visualize flows in this CA wind tunnel. This is an invertible CA rule, ex- 
cept at the boundaries which are irreversibly being forced (additional bits per site 
mark the boundaries). Simple single-speed CA’s have also been used to simulate 
3D hydrodynamics [2, 22]. 


CRYSTALLINE COMPUTATION 289 


Fig. 18.14. Some CA MD simulations. (a) Flow through a porous medium. (b) A topo- 
logically complicated structure within a chemical reaction simulation. (c) Crystallization 
using irreversible discrete forces. 


When it was discovered that lattice gases could simulate hydrodynamic behavior, 
there was a great deal of excitement in some circles and skepticism in others. The 
exciting prospect was that by simplifying MD simulations to the point where only 
the essence of hydrodynamic behavior remained, one could extend the scale of these 
simulations to the point where interesting hydrodynamics could be done directly 
with an MD method. This spawned an entire new field of research [10, 13, 22, 23, 
52, 71, 76]. This optimistic scenario has not yet been realized. One problem is that 
simple single speed models aren’t well suited for simulating high-speed flows. As 
in a photon gas, the sound speed in a single-speed LGA is almost the same as the 
maximum particle speed, making supersonic flows impossible to simulate. You need 
to add more particle speeds to fix this. The biggest problem, though, is that you 
need truly enormous CA systems to get the resolution needed for hydrodynamic 
simulations with high Reynold’s numbers [102]. 


For the near term, for those interested in practical modeling, it makes sense 
to avoid high-Reynold’s numbers and fast fluid flows, and to use MD CA models 
to simulate other kinds of systems that are hard to simulate by more conventional 
numerical techniques. Suitable candidates would include systems for which the 
best current simulation techniques are in fact some form of molecular dynamics, 
as well as systems for which there are at present no good simulation techniques 
because traditional MD cannot reach the hydrodynamic regime. An example would 
be systems with very complicated flows. Figure 18.14a shows a simulated flow 
through a piece of sandstone. The shape of the sandstone was obtained from MRI 
imaging of an actual rock, taking advantage of the ability of CA MD simulations 
to handle arbitrarily shaped obstacles. Shading in the figure indicates flow velocity. 
Simulations were compared against experiments on the same rock that was imaged, 
and agreement was excellent [2, 79]. More complicated flows, involving immiscible 
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liquids, have been simulated with this same technique. 


CA models of complex systems can be built up by combining simpler models. 
We simply pile up as many bits as we need at each lattice site, representing as many 
fluids, random variables, heat baths, and other fields as we desire. Then we update 
them in some repeated sequence, including extra steps that make different groups of 
subsystems interact, much as we did in our diffusion and our refraction examples. 
For practical purposes we will often dispense with invertibility, and be satisfied 
with irreversible rules coupled to pseudo random subsystems. Figure 18.14b shows 
an example of a 3D chemical reaction simulation of this sort, which simulates the 
FitzHugh-Nagumo reaction-diffusion dynamics [56]. The knot and its surroundings 
are composed of two different chemical phases. The connectivity of the knot in 
conjunction with domain repulsion keeps the knot from shrinking away. Many kinds 
of multiphase fluids, microemulsions, and other complex fluids have been simulated 
using related techniques [9, 14, 79]. 


We can easily add discrete forces by having particles at a discrete set of vector 
separations interact. If two such particles are heading away from each other we can 
point them toward each other and otherwise leave them unchanged—this results in 
an attraction. This kind of rule isn’t invertible, but it is energy and momentum 
conserving. Figure 18.14c shows a 3D crystallization simulation using a potential 
built up out of such interactions [103]. This is not currently a very practical way 
to simulate crystals, but this kind of technique is generally useful {3, 104]. For 
example, the “smoke” in Figure 18.13 has a weak cohesive force of this kind, which 
makes the smoke streams thinner. 


There are many other ways to build new CA MD models. We often appeal to 
microdynamical analogy, or to simulating “snapshots” of a hypothetical continu- 
ous dynamics. We can take aspects of existing atomistic models and model them 
statistically at a higher level of aggregation using exact integer counts and conser- 
vations, to avoid any possibility of numerical instability [12]. We can combine CA’s 
with more traditional numerical mesh techniques, using discrete particles to han- 
dle difficult interface regions 50]. We can adapt various energy-based techniques 
from statistical mechanics [9, 38]. We can also build useful models in a less sys- 
tematic manner, justifying their use by simulation and careful measurements [79]. 
Combining well understood CA MD components to build up simulations of more 
complex systems is a kind of iterative programming exercise that involves testing 
components in various combinations, and adjusting interactions. 


Although there is already a role for CA MD models even on conventional comput- 
ers, there is a serious mismatch on such machines between hardware and algorithms. 
If we are going to design MD simulations to fit into a CA format, we should take 
advantage of the uniformity and locality of CA systems, which are ideally suited to 
efficient and large-scale hardware realization. 
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18.7 Crystalline computers 


Computer algorithms and computer hardware evolve together. What we mean by 
a good algorithm is that we have found some sort of efficient mapping between the 
computation and the hardware. For example, CA and other lattice algorithms are 
sometimes “efficiently” coded on conventional machines by mapping the parallelism 
and uniformity of the lattice model onto the limited parallelism and uniformity 
of word-wide logical operations—so-called “multi-spin coding.” This is a rather 
grotesque physical realization for models that directly mimic the structure and 
locality of physics: We first build a computer that hides the underlying spatial 
structure of Nature, and then we try to find the best way to contort our spatial 
computation to fit into that mold! 


Ultimately all physical computations have to fit into a spatial mold, and so our 
most efficient computers and algorithms will eventually have to evolve toward the 
underlying spatial “hardware” of Nature (94]. Because physical information can 
travel at only a finite velocity, portions of our computation that need to commu- 
nicate quickly must be physically located close together. Computer architects can 
only hide this fact-of-life from us for so long. At some point, if we want our com- 
putations to run faster, our algorithms must take on the responsibility of dealing 
with this constraint. 


Computer engineers are not unaware of this spatial constraint. Various locally- 
interconnected parallel computers have been built and studied [53]. Mesh architec- 
tures are organized like a kind of CA, but usually with a rather powerful computer 
with a large memory at each lattice site. Unlike CA’s, they normally don’t have 
the same operation occurring everywhere in the lattice at the same time. SIMD or 
data parallel mesh machines are more CA-like, since they typically have a simpler 
processor at each site, and they do normally have the operation of all processors 
synchronized in perfect lockstep. 


Another important spatial computing device is the gate array. These regular 
arrays of logic elements are very much like a universal CA. Initially, we build these 
chips with arrays of logic elements, but we leave out the wiring. Later, when we 
need a chip with a specific functionality, we can quickly “program” the gate array 
by simply adding wires to connect together these elements into an appropriate 
logic circuit. FPGA’s (field programmable gate arrays) make programming the 
interconnections even easier. What should be connected to what is specified by 
some bits that we communicate directly to the chip: This rapid transition from 
bits to circuitry eliminates much of the distinction that is normally made between 
hardware and software [73]. 


As general purpose computing devices, none of these CA-like machines are sig- 
nificant mainstream technologies—the evolutionary forces that are pushing us in 
the CA direction haven’t yet pushed hard enough. This was even more true when 
Tom Toffoli and I first started playing with CA’s together, almost two decades ago. 
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Fig. 18.15. CA machines. (a) Our earliest CA machines scanned their memory like a 
framebuffer, applying discrete logic to a sequence of neighborhood windows. (b) CAM-8 
uses a 3D mesh array of SIMD processors. Each processor handles part of the lattice, 
alternating data movement with sequential lookup-table updating of the groups of bits 
that land at each lattice site. 


There were no machines available to us then that would let us run and display CA’s 
quickly enough that we could experience them as dynamical worlds. We became in- 
creasingly frustrated as we tried to explore the new and exciting realm of invertible 
CA’s on available graphical workstations: Each successive view of our space took 
minutes to compute. 


Tom designed and built our first CA simulation hardware. This CA machine was 
a glorified frame-buffer that kept the CA spacein memory chips. As it scanned a 2D 
array of pixels out of the memory, instead of just showing them on a screen, it first 
applied some neighborhood logic to the data, then sent each resulting pixel to the 
screen while also putting it back into memory. At first the neighborhood logic was on 
a little prototyping board (this is shown schematically in Figure 18.15a). I learned 
about digital circuitry by building site-update rules out of TTL chips—each small 
logic circuit I built became the dynamical law for a different video-rate universe! 
Eventually, we switched to lookup tables, which largely replaced the prototype 
circuits. The first few generations of machines, however, all had wires that you could 
move around to change things—whenever you wanted to demonstrate something, 
you would invariably find that someone else had rewired your machine (88]. 


We went through several generations of hardware, playing with models that no 
one else had ever seen [90]. At a certain point we rediscovered the HPP lattice 
gas, and our simulations rekindled interest in this kind of model. At this point, 
our machines became inadequate. They had been designed for interacting and 
experimenting with small 2D systems that you could watch on a monitor. Real 


CRYSTALLINE COMPUTATION 293 


CA MD was going to require large-scale 3D machines designed for serious scientific 
simulation, with provisions for extensive data analysis and visualization. A new 
dedicated CA MD machine could be about 1000 times as cost effective as existing 
supercomputers for this task, and would provide the interactivity and flexibility of 
a personal CA workstation. 


I designed the new architecture (CAM-8) based on the experience that Tom 
and J had with our earlier machines [66]. As shown in Figure 18.15b, this machine 
uses a 3D mesh array of SIMD processors running in perfect lockstep. The lattice 
being simulated (which can be n dimensional) is divided up evenly among the 
processors, each processor handling an equal sector of the overall lattice. As in our 
earlier machines, the state of the lattice is kept in ordinary memory chips, and the 
updating is done by lookup tables—also memory chips. Data cycles around and 
around between these two sets of memory chips. 


Unlike our previous machines, which provided a fixed set of traditional CA neigh- 
borhoods, the only neighborhood format supported by CAM-8 is site-partitioning 
(discussed in Section 18.6). Any bit field (ie., set of corresponding bits, one from 
every site) can be shifted uniformly across the lattice in any direction. Whatever 
data land at a given lattice site are updated together as a group. This is a particu- 
larly convenient neighborhood format from a modeling point of view, since any CA 
dynamics on any neighborhood can be accomplished by performing an appropriate 
sequence of data shifts and site updates, each acting on a limited number of site 
bits at a time. Also, as we've seen, partitioning is a particularly good format for 
constructing models that incorporate desired physical properties. 


Site partitioning is also a very convenient neighborhood format from a hardware 
standpoint. Since the pattern of data movement is very simple and regular, mesh 
communication between processors is also very simple. Since the updating is done 
on each site independently, it doesn’t matter what order the sites are updated in, 
or how many different processors are involved in the updating. All of this can be 
organized in the manner most efficient for the hardware. 


CAM-8 machines were built and performed as expected. All of the simulations 
depicted in this chapter were done on CAM-8, except for that of Figure 18.14a 
(which is similar to [2]). CAM-8 has not, unfortunately, had the impact that we 
hoped it might. First of all, during the time that we were building the machine, 
it was found that lattice gases weren’t as well suited for high Reynold’s number 
hydrodynamic flow simulations as people had hoped. In addition, in the absence 
of any good CA machines, interest in lattice dynamics calculations had shifted to 
techniques that make better use of the floating point capabilities of conventional 
computers. Also, most researchers interested in developing practical applications 
already had good access to conventional supercomputers, which were 1000 times less 
cost effective than CAM-8, but had familiar and high-quality software and system 
support. Finally, the evolutionary forces favoring CA-like machines were temporar- 
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ily on the wane at the time when CAM-8 was completed, as multiprocessor funding 
dried up and fine-grained parallel computing companies folded. We didn’t build 
any versions of our indefinitely scalable CAM-8 machine that were large enough 
to make previously unreachable realms of CA modeling accessible—as our early 
machines first opened the CA world to us. 


In the near term, prospects again look good for CA machines. Although our 
small personal-computer-scale CAM-8 machines are still about as good as any su- 
percomputer for LGA computations, advances in technology make radical improve- 
ments possible. By putting logic directly on DRAM memory chips, which is now 
routinely done, and by exploiting the enormous memory bandwidth that can be 
made available on-chip, it is possible today to make a SIMD machine similar to the 
CAM-8 that is over 10,000 times faster per memory chip than the current CAM- 
8 hardware [69]. Putting together arrays of such chips, qualitatively new physical 
simulations will become possible. Other SIMD applications such as logic simulation, 
image processing and 3D bit-map manipulation/rendering will also run at about a 
trillion bit-operations per second per chip. Whether we manage to make our next 
dream machine, the time is ripe for commercial SIMD-based CA machines. 


What of the more distant future? In the preceding sections, we have emphasized 
invertible CA’s. Aside from their intrinsic interest, they have the virtue that they 
mimic the microscopic invertibility of physical dynamics. From a macroscopic point 
of view, this means that these CA’s can in principle be implemented using friction- 
less reversible mechanisms—they don’t depend on dissipative thermodynamically 
irreversible processes in order to operate [27, 100, 106, 107]. Thus 3D machines 
based on invertible CA’s won’t have the same problem getting rid of dissipated 
heat that irreversible machines do [4, 5, 26]. From a more microscopic point of 
view, we can see the match between invertible computation and invertible quantum 
physics as making possible direct use of quantum scale elements and processes to 
do our computations. We can make use of discrete properties of these quantum 
elements to represent our digital quantities and perform our digital computations. 


Thus in the distant future I expect that our most powerful large-scale general 
purpose computers will be built out of macroscopic crystalline arrays of identical 
invertible computing elements. We would make such large arrays out of identical 
elements because they will then be easier to control, to design, to build and to 
test. These will be the distant descendants of todays SIMD and FPGA computing 
devices: When we need to perform inhomogeneous computations, we will put the 
irregularities into the program, not the hardware. The problem of arranging the 
pieces of a computation in space will be part of the programming effort: Architec- 
tural ideas that are used today in physical hardware may reappear as data structures 
within this new digital medium. With molecular scale computing elements, a small 
chunk of this computronium [64] would have more memory and processing power 
than all of the computers in the world today combined, and high Reynold’s number 
CA MD calculations of fluid flow would be practical on such machines. 


CRYSTALLINE COMPUTATION 295 


Note that I don’t expect our highest performance general purpose computers 
to be quantum spin computers of the sort discussed in Section 18.1. In such a 
machine, the whole computer operates on a superposition of distinct computations 
simultaneously. This kind of quantum parallelism is very delicate, and the overhead 
associated with the difficult task of maintaining a superposition of computations 
over a large spatial scale will be such that it will only be worth doing in very 
specialized situations—if it is possible at all [75]. This won’t be something that we 
will do in our general purpose computers. 


18.8 What makes a CA world interesting? 


Future CA machines will make extensive large-scale CA simulations possible—we 
will be able to study the macroscopic properties of CA worlds that we design. 
Aside from issues of size and speed, there doesn’t seem to be any obvious reason why 
exact classical information models cannot achieve as high a level of rich macroscopic 
complexity as we see in our universe. This is a very different modeling challenge 
than trying to simulate quantum mechanics with CA’s. We would like to simulate 
an interesting macroscopic world which is built out of classical information. It is 
instructive to try to see what the difficulties might be. 


The most important thing in our universe that makes it interesting to us is of 
course us. Or more generally, the existence of complex organisms. Thus let’s begin 
by seeing what it might take to simulate a world in which Darwinian evolution is 
likely to take place. Since no one has yet made a CA that does this, our discussion 
will be quite speculative. 


One of the most successful computer models of evolution is Tom Ray’s Tierra 
(77, 78], which was designed to capture—in an exact digital model—an essential set 
of physical constraints abstracted from biology. His model did not include spatial 
locality or invertibility, but we could try to add these features. 


Modeling evolution in a robust spatial fashion may, however, entail incorporating 
some physical properties into our CA systems that are not so obvious [65]. For 
example, in Nature we have the property that we can take complicated objects and 
set them in motion. This property seems to be essential for robust evolution: It 
is hard to imagine the evolution of complex organisms if simpler pieces of them 
can’t move toward each other! No known universal CA has this property (but 
see [3, 18, 104]). There is nothing in the Life CA, for example, that corresponds to 
a glider-gun in motion. 


The general property of physics that allows us to speak about an object at rest 
and then identify the same object in motion is relativistic invariance. The fact that 
the laws of physics look the same in different relativistic frames means that we can 
have the same complex macroscopic objects in all states of motion: An organism’s 
chemistry doesn’t stop working if it moves, or if the place it lives moves! In a 
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spatial CA dynamics, some sort of spatial macroscopic motion invariance would 
clearly make evolution more likely. Since our CA’s have a maximum speed at 
which information can travel—a finite speed of light—relativistic invariance is a 
possible candidate. Full relativistic invariance may be more than we need, but it is 
interesting to ask, “Can we incorporate relativistic invariance into a universal CA 
model?” 


We have already seen that we can have macroscopic rotational invariance in our 
lattice gas models, and we know that numerical mesh calculations of relativistic dy- 
namics are possible. Thus achieving a good approximation of relativistic invariance 
in the macroscopic limit for an exact CA model seems possible [43]. Such a system 
would, at least in the limit, have the conservations associated with the continu- 
ous Lorentz group of symmetries. Although it is not possible to put a continuous 
symmetry directly into a discrete CA rule, it is certainly possible to put these con- 
servations into the rule, along with a discrete version of the symmetries—just as we 
did in our lattice gas models.? 


Thus we might imagine our relativistically invariant CA to be a refinement of 
lattice gases—we would also like to make it invertible for the reasons discussed 
in Section 18.4. But we also demand that this CA incorporate computation uni- 
versality. This may not be easy: Since a relativistically invariant system must 
have momentum conservation, we will need to worry about how to hold complex 
interacting structures together. Thus we may need to incorporate some kind of 
relativistically invariant model of forces into our system. 


Simulating forces in an exact and invertible manner is not so easy, particularly 
if we want the forces to last for a long time [105]. Models in which forces are com- 
municated by having all force-sources continuously broadcast field-particles have 
the problem that the space soon fills up with these field-particles—which cannot be 
erased because of local invertibility—and then the force can no longer be communi- 
cated. Directly representing field gradients works better, but making this work in 
a relativistic context may be hard. 


At this point, we might also begin to question our basic CA assumptions. We 
introduced crystalline CA’s to try to emulate the spatial locality of physics in our 
informational models, but we are now discussing modeling phenomena in a realm of 
physics in which modern theories talk about extra dimensions and variable topology. 
Perhaps whatever is essential fits nicely into a simple crystalline framework, but 
perhaps we need to consider alternatives. We could easily be led to informational 
models in which the space and time of the microscopic computation becomes rather 
divorced from the space and time of the macroscopic phenomena. 


We started this section with the (seemingly) modest goal of using a CA to try to 
3In continuum physics, continuous symmetries are regarded as fundamental and conservations 


arise aS a consequence of symmetry. Fredkin has pointed out that in discrete systems, it must be 
the conservation that is fundamental. 
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capture aspects of physics necessary for a robust evolution of interesting complexity, 
and we have been led to discuss incorporating larger and larger chunks of physics 
into our model. Perhaps our vision is too limited, and there are radically different 
ways in which we can have robust evolution in a spatial CA model. Or perhaps we 
can imitate Nature, but cheat in various ways. We may not need full relativistic 
invariance. We may not need exact invertibility. On the other hand, it is also 
perfectly possible that we can’t cheat very much and still get a system that’s nearly 
as interesting as our world. 


18.9 Conclusion 


I was in high school when I first encountered cellular automata—I read an article 
about Conway’s “Game of Life.” At that time I was intensely interested in both 
Physics and Computers, and this game seemed to combine the two of them. I 
immediately wondered if our universe might be a giant cellular automaton. 


The feeling that physics and computation are intimately linked has remained 
with me over the years. Trying to understand the difficulties of modeling Nature 
using information has provided an interesting viewpoint from which to learn about 
physics, and also to learn about the ultimate possibilities of computer hardware. 
I have learned that many properties of macroscopic physics can be mirrored in 
simple informational models. I have learned that quantum mechanics makes both 
the amount and the rate-of-change of information in physical systems finite—all 
physical systems do finite information processing [68]. I have learned that the 
non-separability of quantum systems makes it hard to model them efficiently us- 
ing classical information—it is much easier to construct quantum spin computer 
models [1]. I have learned that physics-like CA models can be the best possible 
algorithms when the computer hardware is also adapted to the constraints and 
structure of physical dynamics. I have learned that developing computer hardware 
that promotes this viewpoint can consume an enormous amount of time! 


Since classical information is much easier to understand than quantum informa- 
tion I have mostly studied classical CA models. In these systems, a macroscopic 
dynamical world arises from classical combinatorics. Continuous classical-physics 
behavior can emerge in the large-scale limit. We can try to model and understand 
(and perhaps teach our students about) all sorts of physical phenomena without get- 
ting involved in quantum complications. We can also try to clarify our understand- 
ing of the fundamental quantities, concepts and principles of classical mechanics and 
of classical computation by studying such systems [32]. The principle of stationary 
action, for example, must arise in such systems solely from combinatorics—there is 
no underlying quantum substratum in a CA model. Conversely, we should remem- 
ber that information (in the guise of entropy) was an important concept in physics 
long before it was discovered by computer scientists. Just as Ising-like systems 
have provided intuitive classical models that have helped clarify issues in statistical 
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mechanics, CA’s could play a similar role in dynamics. 


Since we have focused so much on discrete classical CA models of physics, it 
might be appropriate to comment briefly on their relationship to discrete quantum 
models—Feynman’s quantum spin computer of Section 18.1. Exactly the same kinds 
of grouping and sublattice techniques that we have used to construct invertible CA’s 
also allow us to construct quantum CA’s—QCA’s [61]. We simply replace invertible 
transformations on groups of bits with unitary transformations on groups of spins. 
Just as it is an interesting problem to try to recover classical physics from ordinary 
CA’s, it is also interesting to try to find QCA’s that recover the dynamics of known 
quantum field theories in the macroscopic limit (11, 57, 58, 105]. Following our Ising 
CA example, it might be instructive to investigate classical CA’s that are closely 
related to such QCA’s. 


Although people have often studied CA’s as abstract mathematical systems 
completely divorced from Nature, ultimately it is their connections to physics that 
make them so interesting. We can use them to try to understand our world better, 
to try to do computations better—or we can simply delight in the creation of our 
own toy universes. As we sit in front of our computer screens, watching to see 
what happens next, we never really know what new tricks our CA’s may come up 
with. It is really an exploration of new worlds—live television from other universes. 
Working with CA’s, anyone can experience the joy of building simple models and 
the thrill of discovering something new about the dynamics of information. We can 
all be theoretical physicists. 
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Notes on the references 


Much of the material in this chapter is discussed at greater length in [62] and 
[90]. These documents were both strongly influenced by ideas and suggestions 
from Edward Fredkin, some of which are also discussed in [30-35]. Some recent 
books on CA modeling of physics are [79] and [18]. Many of the early lattice- 
gas and quantum computing papers are reproduced in [22] and [54] respectively. 
Recent related papers can be found online in the comp-gas and quant-ph archives 
at http: //xxx.lanl. gov, and cross-listed there from other archives at LANL such 
as chao-dyn. Pointers to papers in these archives are given in some of the references 
below. 
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INFORMATION, PHYSICS, QUANTUM: THE 
SEARCH FOR LINKS 


John Archibald Wheeler * t 


Abstract 


This report reviews what quantum physics and information theory have to tell us 
about the age-old question, How come existence? No escape is evident from four 
conclusions: (1) The world cannot be a giant machine, ruled by any preestablished 
continuum physical law. (2) There is no such thing at the microscopic level as 
space or time or spacetime continuum. (3) The familiar probability function or 
functional, and wave equation or functional wave equation, of standard quantum 
theory provide mere continuum idealizations and by reason of this circumstance 
conceal the information-theoretic source from which they derive. (4) No element in 
the description of physics shows itself as closer to primordial than the elementary 
quantum phenomenon, that is, the elementary device-intermediated act of posing a 
yes-no physical question and eliciting an answer or, in brief, the elementary act of 
observer-participancy. Otherwise stated, every physical quantity, every it, derives 
its ultimate significance from bits, binary yes-or-no indications, a conclusion which 
we epitomize in the phrase, it from bit. 


19.1 Quantum Physics Requires a New View of Reality 


Revolution in outlook though Kepler, Newton, and Einstein brought us [1-4], and 
still more startling the story of life [5~7] that evolution forced upon an unwilling 
world, the ultimate shock to preconceived ideas lies ahead, be it a decade hence, 
a century or a millenium. The overarching principle of 20th-century physics, the 
quantum [8] — and the principle of complementarity [9] that is central idea of the 
quantum — leaves us no escape, Niels Bohr tells us, [10] from “a radical revision 
of our attitude as regards physical reality” and a “fundamental modification of 
all ideas regarding the absolute Character of physical phenomena.” Transcending 
Kinstein’s summons [11] of 1908, “This quantum business is so incredibly important 
and difficult that everyone should busy himself with it,” Bohr’s modest words direct 
us to the supreme goal: Deduce the quantum from an understanding of existence. 


*Reproduced from Proc. 3rd Int. Symp. Foundations of Quantum Mechanics, Tokyo, 1989, 
pp.354-368. 


tThe copyright of this paper is held by the author. 
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How make headway toward a goal so great against difficulties so large? The 
search for understanding presents to us three questions, four no’s and five clues: 


Three questions, 
e How come existence? 
e How come the quantum? 


e How come “one world” out of many observer-participants? 


Four no’s, 


e No tower of turtles 
e No laws 
e No continuum 


e No space, no time. 
Five clues, 


e The boundary of a boundary is zero. 
e No question? No answer! 

e The super-Copernican principle. 

e “Consciousness” 


e More is different. 


19.2 “It from Bit” as Guide in Search for Link Connecting 
Physics, Quantum and Information 


In default of a tentative idea or working hypothesis, these questions, no’s and clues 
— yet to be discussed — do not move us ahead. Nor will any abundance of clues 
assist a detective who is unwilling to theorize how the crime was committed! A 
wrong theory? The policy of the engine inventor, John Kris, reassures us, “Start 
her up and see why she don’t go!” In this spirit [12-47] I, like other searchers [48-51] 
attempt formulation after formulation of the central issues, and here present a wider 
overview, taking for working hypothesis the most effective one that has survived this 
winnowing: It from bit. Otherwise put, every it — every particle, every field of 
force, even the spacetime continuum itself — derives its function, its meaning, its 
very existence entirely — even if in some contexts indirectly — from the apparatus- 
elicited answers to yes or no questions, binary choices [52], bits. 
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It from bit symbolizes the idea that every item of the physical world has at 
bottom — at a very deep bottom, in most instances — an immaterial source and 
explanation; that what we call reality arises in the last analysis from the posing of 
yes-no questions and the registering of equipment-evoked responses; in short, that 
all things physical are information-theoretic in origin and this is a participatory 
universe. 


Three examples may illustrate the theme of it from bit. First, the photon. With 
polarizer over the distant source and analyzer of polarization over the photodetector 
under watch, we ask the yes or no question, “Did the counter register a click during 
the specified second?” If yes, we often say, “A photon did it.” We know perfectly 
well that the photon existed neither before the emission nor after the detection. 
However, we also have to recognize that any talk of the photon “existing” during 
the intermediate period is only a blown-up version of the raw fact, a count. 


The yes or no that is recorded constitutes an unsplitable bit of information. A 
photon, Wootters and Zurek demonstrate (53, 54], cannot be cloned. 


As second example of it from bit, we recall the Aharonov-Bohm scheme [55] to 
measure a magnetic flux. Electron counters stationed off to the right of a doubly- 
slit screen give yes-or-no indications of the arrival of an electron from the source 
located off to the left of the screen, both before the flux is turned on and afterward. 
That flux of magnetic lines of force finds itself embraced between — but untouched 
by — the two electron beams that fan out from the two slits. The beams interfere. 
The shift in interference fringes between field off and field on reveals the magnitude 
of the flux, 


(phase change around perimeter of the included area) 
= 2n x (shift of interference pattern, measured in number of fringes) (19.1) 
= (electron charge) x (magnetic flux embraced) /fic 


Here fi = 1.0546 x 10-2” gcm?/s is the quantum in conventional units, or in ge- 
ometric units [4, 16] — where both time and mass are measured in the units of 
length — fi = fic = 2.612 x 10-®° cm? = the square of the Planck length, 1.616 x 
10-33 = what we hereafter term the Planck area. 


Not only in electrodynamics but also in geometrodynamics and in every other 
gauge-field theory, as Anandan, Aharonov and others point out [56, 57] the differ- 
ence around a circuit in the phase of an appropriately chosen quantum-mechanical 
probability amplitude provides a measure of the field. Here again the concept of it 
from bit applies [38]. Field strength or spacetime curvature reveals itself through 
shift of interference fringes, fringes that stand for nothing but a statistical pattern 
of yes-or-no registrations. 


When a magnetometer reads that i¢ which we call a magnetic field, no reference 
at all to a bit seems to show itself. Therefore we look closer. The idea behind the 
operation of the instrument is simple. A wire of length / carries a current 1 through 
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——— 
on aga ee 


Fig. 19.1. Symbolic representation of the “telephone number” of the particular one of the 
2" conceivable, but by now indistinguishable, configurations out of which this particular 
blackhole, of Bekenstein number N and horizon area 4Nfilog.2, was put together. Symbol, 
also, in a broader sense, of the theme that every physical entity, every it, derives from bits. 
Reproduced from JGST, p.220. 


a magnetic field B that runs perpendicular to it. In consequence the piece of copper 
receives in the time ¢ a transfer of momentum p in a direction z perpendicular to 
the directions of the wire and of the field, 


p= Blit 
= (flux per unit z) x (charge, e, of the elementary carrier of current) (19.2) 
x(number, N, of carriers that pass in the time ¢) 


This impulse is the source of the force that displaces the indicator needle of the 
magnetometer and gives us an instrument reading. We deal with bits wholesale 
rather than bits retail when we run the fiducial current through the magnetometer 
coil, but the definition of field founds itself no less decisively on bits. 


As third and final example of it from bit we recall the wonderful quantum 
finding of Bekenstein [58-60] — totally unexpected denouement of earlier classical 
work of Penrose (61] Christodoulou [62] and Ruffini [63] — refined by Hawking (64, 
65] that the surface area of the horizon of a blackhole, rotating or not, measures 
the entropy of the blackhole. Thus this surface area, partitioned in imagination 
(Fig. 19.1) into domains each of size 4flog.2, that is, 2.77... times the Planck area, 
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yields the Bekenstein number, N; and the Bekenstein number, so Thorne and Zurek 
explain [66] tells us the number of binary digits, the number of bits, that would be 
required to specify in all detail the configuration of the constituents out of which 
the blackhole was put together. Entropy is a measure of lost information. To no 
community of newborn outside observers can the blackhole be made to reveal out 
of which particular one of 2" configurations it was put together. Its size, an it, is 
fixed by the number, N, of bits of information hidden within it. 


The quantum, /, in whatever correct physics formula it appears, thus serves as 
lamp. It lets us see horizon area as information lost, understand wave number of 
light as photon momentum and think of field flux as bit-registered fringe shift. 


Giving us its as bits, the quantum presents us with physics as information. 


How come a value for the quantum so small as A = 2.612 x 10—§8 cm?2? As well 
as ask why the speed of light is so great as c = 3 x 10’? cm/s! No such constant 
as the speed of light ever makes an appearance in a truly fundamental account 
of special relativity or Einstein geometrodynamics, and for a simple reason: Time 
and space are both tools to measure interval. We only then properly conceive 
them when we measure them in the same units [4, 16]. The numerical value of 
the ratio between the second and the centimeter totally lacks teaching power. It 
is an historical accident. Its occurrence in equations obscured for decades one of 
Nature’s great simplicities. Likewise with h! Every equation that contains an h 
floats a banner, “It from bit”. The formula displays a piece of physics that we 
have learned to translate into information-theoretic terms. Tomorrow we will have 
learned to understand and express all of physics in the language of information. At 
that point we will revalue 4 = 2.612 x 10~-®° cm? — as we downgrade c = 3 x 10? 
cm/s today — from constant of Nature to artifact of history, and from foundation 
of truth to enemy of understanding. 


19.3 Four No’s 


To the question, “How come the quantum?” we thus answer, “Because what we 
call existence is an information-theoretic entity.” But how come existence? Its 
as bits, yes; and physics as information, yes; but whose information? How does 
the vision of one world arise out of the information-gathering activities of many 
observer-participants? In the consideration of these issues we adopt for guidelines 
four no’s. 


First no: “No tower of turtles,” advised William James. Existence is not a globe 
supported by an elephant, supported by a turtle, supported by yet another turtle, 
and so on. In other words, no infinite regress. No structure, no plan of organization, 
no framework of ideas underlaid by another structure or level of ideas, underlaid 
by yet another level, by yet another, ad infinitum, down to a bottomless night. To 
endlessness no alternative is evident but loop [47, 67], such a loop as this: Physics 
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gives rise to observer-participancy; observer-participancy gives rise to information; 
and information gives rise to physics. 


Existence thus built [68] on “insubstantial nothingness”? Rutherford and Bohr 
made a table no less solid when they told us it was 99.9... percent emptiness. 
Thomas Mann may exaggerate when he suggests [69] that “... we are actually 
bringing about what seems to be happening to us,” but Leibniz [70] reassures us, 
“Although the whole of this life were said to be nothing but a dream and the physical 
world nothing but a phantasm, I should call this dream or phantasm real enough 
if, using reason well, we were never deceived by it.” 


Second no: No laws. “So far as we can see today, the laws of physics cannot 
have existed from everlasting to everlasting. They must have come into being at 
the big bang. There were no gears and pinions, no Swiss watch-makers to put 
things together, not even a pre-existing plan... Only a principle of organization 
which is no organization at all would seem to offer itself. In all of mathematics, 
nothing of this kind more obviously offers itself than the principle that ‘the boundary 
of boundary is zero.’ Moreover, all three great field theories of physics use this 
principle twice over... This circumstance would seem to give us some reassurance 
that we are talking sense when we think of... physics being” [32] as foundation-free 
as a logic loop, the closed circuit of ideas in a self-referential deductive axiomatic 
system [71-74]. 


Universe as machine? This universe one among a great ensemble of machine 
universes, each differing from the others in the values of the dimensionless constants 
of physics? Our own selected from this ensemble by an anthropic principle of one 
or another form [75]? We reject here the concept of universe as machine not least 
because it “has to postulate explicitly or implicitly, a supermachine, a scheme, 
a device, a miracle, which will turn out universes in infinite variety and infinite 
number” [47]. 


Directly opposite to the concept of universe as machine built on law is the vision 
of a world self-synthesized. On this view, the notes struck out on a piano by the 
observer-participants of all places and all times, bits though they are, in and by 
themselves constitute the great wide world of space and time and things. 


Third no: No continuum. No continuum in mathematics and therefore no con- 
tinuum in physics. A half-century of development in the sphere of mathematical 
logic [76] has made it clear that there is no evidence supporting the belief in the ex- 
istential character of the number continuum. “Belief in this transcendental world,” 
Hermann Wey] tells us, “taxes the strength of our faith hardly less than the doc- 
trines of the early Fathers of the Church or of the scholastic philosophers of the 
Middle Ages” [77]. This lesson out of mathematics applies with equal strength 
to physics. “Just as the introduction of the irrational numbers... is a convenient 
myth [which] simplifies the laws of arithmetic... so physical objects,” Willard Van 
Orman Quine tells us [78] “are postulated entities which round out and simplify 
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our account of the flux of existence... The conceptual scheme of physical objects 
is a convenient myth, simpler than the literal truth and yet containing that literal 
truth as a scattered part.” 


Nothing so much distinguishes physics as conceived today from mathematics as 
the difference between the continuum character of the one and the discrete charac- 
ter of the other. Nothing does so much to extinguish this gap as the elementary 
quantum phenomenon “brought to a close,” as Bohr puts it [10] by “an irreversible 
act of amplification,” such as the click of a photodetector or the blackening of 
a grain of photographic emulsion. Irreversible? More than one idealized experi- 
ment [38] illustrates how hard it is, even today, to give an all-inclusive definition of 
the term irreversible. Those difficulties supply pressure, however, not to retreat to 
old ground, but to advance to new insight. In brief, continuum-based physics, no; 
information-based physics, yes. 


Fourth and last no: No space, no time. Heaven did not hand down the word 
“time”. Man invented it, perhaps positing hopefully as he did that “Time is Nature’s 
way to keep everything from happening all at once” [79]. If there are problems with 
the concept of time, they are of our own creation! As Leibniz tells us, [80] “... time 
and space are not things, but orders of things... ;” or as Einstein put it, [81] “Time 
and space are modes by which we think, and not conditions in which we live.” 


What are we to say about that weld of space and time into spacetime which 
Einstein gave us in his 1915 and still standard classical geometrodynamics? On this 
geometry quantum theory, we know, imposes fluctuations [13, 14, 82]. Moreover, the 
predicted fluctuations grow so great at distances of the order of the Planck length 
that in that domain they put into question the connectivity of space and deprive 
the very concepts of “before” and “after” of all meaning [83]. This circumstance 
reminds us anew that no account of existence can ever hope to rate as fundamental 
which does not translate all of continuum physics into the language of bits. 


We will not feed time into any deep-reaching account of existence. We must 
derive time — and time only in the continuum idealization — out of it. Likewise 
with space. 


19.4 Five Clues 


First clue. The boundary of a boundary is zero. This central principle of algebraic 
topology [84], identity, triviality, tautology, though it is, is also the unifying theme 
of Maxwell electrodynamics, Einstein geometrodynamics and almost every version 
of modern field theory [42], [85-88]. That one can get so much from so little, almost 
everything from almost nothing, inspires hope that we will someday complete the 
mathematization of physics and derive everything from nothing, all law from no 
law. 


Second clue: No question, no answer. Better put, no bit-level question, no bit- 
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level answer. So it is in the game of twenty questions in its surprise version [89]. 
And so it is for the electron circulating within the atom or a field within a space. 
To neither field nor particle can we attribute a coordinate or momentum until a 
device operates to measure the one or the.other. Moreover any apparatus that 
accurately [90] measures the one quantity inescapably rules out then and there the 
operation of equipment to measure the other (9, 91, 92]. In brief, the choice of 
question asked, and choice of when it’s asked, play a part — not the whole part, 
but a part — in deciding what we have the right to say (38, 43). 


Bit-registration of a chosen property of the electron, a bit-registration of the 
arrival of a photon, Aharonov-Bohm bit-based determination of the magnitude of a 
field flux, bulk-based count of bits bound in a blackhole: All are examples of physics 
expressed in the language of information. However, into a bit count that one might 
have thought to be a private matter, the rest of the nearby world irresistibly thrusts 
itself. Thus the atom-to-atom distance in a ruler — basis for a bit count of distance 
— evidently has no invariant status, depending as it does on the temperature and 
pressure of the environment. Likewise the shift of fringes in the Aharonov-Bohm 
experiment depends not only upon the magnetic flux itself, but also on the charge 
of the electron. But this electron charge — when we take the quantum itself to 
be Nature’s fundamental measuring unit — is governed by the square root of the 
quantity e?/fic = 1/137.036..., a “constant” which — for extreme conditions — 
is as dependent on the local environment (93] as is a dielectric “constant” or the 
atom-to-atom spacing in the ruler. 


The contribution of the environment becomes overwhelmingly evident when we 
turn from length of bar or flux of field to the motion of alpha particle through cloud 
chamber, dust particle through 3°K-background radiation or Moon through space. 
This we know from the analyses of Bohr and Mott [94], Zeh (95, 96], Joos and 
Zeh (97), Zurek [98-100] and Unruh and Zurek (101). It from bit, yes; but the rest 
of the world also makes a contribution, a contribution that suitable experimental 
design can minimize but not eliminate. Unimportant nuisance? No. Evidence the 
whole show is wired up together? Yes. Objection to the concept of every it from 
bits? No. 


Build physics, with its false face of continuity, on bits of information! What 
this enterprise is we perhaps see more clearly when we examine for a moment a 
thoughtful, careful, wide-reaching exposition [102] of the directly opposite thesis, 
that physics at bottom is continuous; that the bit of information is not the basic en- 
tity. Rate as false the claim that the bit of information is the basic entity. Instead, 
attempt to build everything on the foundation of some “grand unified field theory” 
such as string theory [103, 104] — or in default of that, on Einstein’s 1915 and still 
standard geometrodynamics. Hope to derive that theory by way of one or another 
plausible line of reasoning. But don’t try to derive quantum theory. Treat it as 
supplied free of charge from on high. Treat quantum theory as a magic sausage 
grinder which takes in as raw meat this theory, that theory or the other theory 
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and turns out a “wave equation,” one solution of which is “the” wave function for 
the universe [14, 102, 105-107]. From start to finish accept continuity as right and 
natural: Continuity in the manifold, continuity in the wave equation, continuity in 
its solution, continuity in the features that it predicts. Among conceivable solutions 
of this wave equation select as reasonable one which “maximally decoheres,” one 
which exhibits “maximal classicity” — maximal classicity by reason, not of “some- 
thing external to the framework of wave function and Schrodinger equation,” but 
something in “the initial conditions of the universe specified within quantum theory 
itself.” 


How compare the opposite outlooks of decoherence and it-from-bit? Remove 
the casing that surrounds the workings of a giant computer. Examine the bundles 
of wires that run here and there. What is the status of an individual wire? Math- 
ematical limit of bundle? Or building block of bundle? The one outlook regards 
the wave equation and wave function to be primordial and precise and built on 
continuity, and the bit to be idealization. The other outlook regards the bit to be 
the primordial entity, and wave equation and wave function to be secondary and 
approximate — and derived from bits via information theory. 


Derived, yes; but how? No one has done more than William Wootters towards 
opening up a pathway [108, 109] from information to quantum theory. He puts 
into connection two findings, long known, but little known. Already before the 
advent of wave mechanics, he notes, the analyst of population statistics R. A. Fisher 
proved (110, 111] that the proper tool to distinguish one population from another is 
not the probability of this gene, that gene and the third gene (for example), but the 
square roots of these probabilities; that is to say, the two probability amplitudes, 
each probability amplitude being a vector with three components. More precisely, 
Wootters proves, the distinguishability between the two populations is measured by 
the angle in Hilbert space between the two state vectors, both real. Fisher, however, 
was dealing with information that sits “out there”. In microphysics, however, the 
information does not sit out there. Instead, Nature in the small confronts us with 
a revolutionary pistol, “No question, no answer.” Complementarity rules. And 
complementarity as E.C.G. Stueckelberg proved [112, 113] as long ago as 1952, 
and as Saxon made more readily understandable [114] in 1964, demands that the 
probability amplitudes of quantum physics must be complex. Thus Wootters derives 
familiar Hilbert space with its familiar complex probability amplitudes from the twin 
demands of complementarity and measure of distinguishability. 


Try to go on from Wootters’s finding to deduce the full blown machinery of 
quantum field theory? Exactly not to try to do so — except as idealization — is 
the demand laid on us by the concept of it from bit. How come? 


Probabilities exist “out there” no more than do space or time or the position of 
the atomic electron. Probability, like time, is a concept invented by humans, and 
humans have to bear the responsibility for the obscurities that attend it. Obscurities 
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there are whether we consider probability defined as frequency [115] or defined 4 la 
Bayes [116-119]. Probability in the sense of frequency has no meaning as applied 
to the spontaneous fission of the particular plutonium nucleus that triggered the 
November 1, 1952 H-bomb blast. 


What about probabilities of a Bayesian cast, probabilities “interpreted not as fre- 
quencies observable through experiments, but as degrees of plausibility one assigns 
to each hypothesis based on the data and on one’s assessment of the plausibility 
of the hypotheses prior to seeing the data” [120]. Belief-dependent probabilities, 
different probabilities assigned to the same proposition by different people [121]? 
Probabilities associated [122] with the view that “objective reality is simply an 
interpretation of data agreed to by large numbers of people?” 


Heisenberg directs us to the experiences [123] of the early nuclear-reaction-rate 
theorist Fritz Houtermans, imprisoned in Kharkov during the time of the Stalin 
terror, “... the whole cell would get together to produce an adequate confession 
... [and] helped them [the prisoners] to compose their ‘legends’ and phrase them 
properly, implicating as few others as possible.” 


Existence as confession? Myopic but in some ways illuminating formulation of 
the demand for intercommunication implicit in the theme of it from bit! 


So much for “No question, no answer.” 


Third clue. The super-Copernican principle [47]. This principle rejects now- 
centeredness in any account of existence as firmly as Copernicus repudiated here- 
centeredness. It repudiates most of all any tacit adoption of here-centeredness in 
assessing observer-participants and their number. 


What is an observer-participant? One who operates an observing device and 
participates in the making of meaning, meaning in the sense of Fgllesdal [124], 
“Meaning is the joint product of all the evidence that is available to those who 
communicate.” Evidence that is available? The investigator slices a rock and pho- 
tographs the evidence for the heavy nucleus that arrived in the cosmic radiation of a 
billion years ago [38]. Before he can communicate his findings, however, an asteroid 
atomizes his laboratory, his records, his rocks and him. No contribution to meaning! 
Or at least no contribution then. A forensic investigation of sufficient detail and 
wit to reconstruct the evidence of the arrival of that nucleus is difficult to imagine. 
What about the famous tree that fell in the forest with no one around [125]? It 
leaves a fallout of physical evidence so near at hand and so rich that a team of up- 
to-date investigators can establish what happened beyond all doubt. Their findings 
contribute to the establishment of meaning. 


“Measurements and observations,” it has been said, [102] “cannot be funda- 
mental notions in a theory which seeks to discuss the early universe when neither 
existed.” On this view the past has a status beyond all questions of observer- 
participancy. It from bit offers us a different vision: “Reality is theory” [126]; “the 
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past has no evidence except as it is recorded in the present” [127]. The photon that 
we are going to register tonight from that four billion-year old quasar cannot be said 
to have had an existence “out there” three billion years ago, or two (when it passed 
an intervening gravitational lens) or one, or even a day ago. Not until we have fixed 
arrangements at our telescope do we register tonight’s quantum as having passed 
to the left (or right) of the lens or by both routes (as in a double slit experiment). 
This registration like every delayed-choice experiment (21, 40], reminds us that no 
elementary quantum phenomenon is a phenomenon until, in Bohr’s words [10], “It 
has been brought to a close” by “an irreversible act of amplification.” What we call 
the past is built on bits. 


Enough bits to structure a universe so rich in features as we know this world to 
be. Preposterous! Mice and men and all on Earth who may ever come to rank as 
intercommunicating meaning-establishing observer-participants will never mount a 
bit count sufficient to bear so great a burden. 


The count of bits needed, huge though it may be, nevertheless, so far as we 
can judge, does not reach infinity. In default of a better estimate, we follow fa- 
miliar reasoning [128] and translate into the language of the bits the entropy of 
the primordial cosmic fireball as deduced from the entropy of the present 2.735°K 
(uncertainty <0.05°K) microwave relict radiation [129] totaled over a 3-sphere of 
radius 13.2 x 10° light years (uncertainty <35%) [130] or 1.25 x 1078 cm and of 


volume 27? radius’, 


(number of bits) = (log2e) x (number of nats) 


= (logae) x (entropy /Boltzmann’s constant, k) (19.3) 
= 1.44... x [(824/45) (radius . kT/fic)>] 
=8x 1088 


It would be totally out of place to compare this overpowering number with the num- 
ber of bits of information elicited up to date by observer-participancy. So warns 
the super-Copernican principle. We today, to be sure, through our registering de- 
vices, give a tangible meaning to the history of the photon that started on its way 
from a distant quasar long before there was any observer-participancy anywhere. 
However, the far more numerous establishers of meaning of time to come have a 
like inescapable part — by device-elicited quesstions and registration of answer — 
in generating the “reality” of today. For this purpose, moreover, there are bil- 
lions of years yet to come, billions on billions of sites of observer-participancy yet 
to be occupied. How far foot and ferry have carried meaning-making communica- 
tion in fifty thousand years gives faint feel for how far interstellar propagation is 
destined (131, 132] to carry it in fifty billion years. 


Do bits needed balance bits achievable? They must, declares the concept of 
“world as system self-synthesized by quantum networking” [47]. By no prediction 
does this concept more clearly expose itself to destruction, in the sense of Pop- 
per [133]. 
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Fourth clue: “Consciousness”. We have traveled what may seem a dizzying path. 
First, elementary quantum phenomenon brought to a close by an irreversible act 
of amplification. Second, the resulting information expressed in the form of bits. 
Third, this information used by observer-participants — via communication — to 
establish meaning. Fourth, from the past through the billeniums to come, so many 
observer-participants, so many bits, so much exchange of information, as to build 
what we call existence. 


Doesn’t this it-from-bit view of existence seek to elucidate the physical world, 
about which we know something, in terms of an entity about which we know almost 
nothing, consciousness [134-137]? And doesn’t Marie Sklodowska Curie tell us, 
“Physics deals with things, not people?” Using such and such equipment, making 
such and such a measurement, I get such and such a number. Who I] am has nothing 
to do with this finding. Or does it? Am I sleepwalking [138, 139]? Or am I one 
of those poor souls without the critical power to save himself from pathological 
science [140-142]? 


Under such circumstances any claim to have “measured” something falls flat 
until it can be checked out with one’s fellows. Checked how? Morton White re- 
minds us [143] how the community applies its tests of credibility, and in this con- 
nection quotes analyses by Chauncey Wright, Josiah Royce and Charles Saunders 
Peirce [144]. Parmenides of Elea [145] (= 515 B.C.- 450+ B.C.) may tell us that 
“What is... is identical with the thought that recognizes it.” We, however, steer 
clear of the issues connected with “consciousness.” The line between the uncon- 
scious and the conscious begins to fade [146] in our day as computers evolve and 
develop — as mathematics has — level upon level upon level of logical structure. 
We may someday have to enlarge the scope of what we mean by a “who”. This 
granted, we continue to accept — as essential part of the concept of it from bit — 
Follesdal’s guideline [124], “Meaning is the joint product of all the evidence that is 
available to those who communicate.” What shall we say of a view of existence [147] 
that appears, if not anthropomorphic in its use of the word “who,” still overly cen- 
tered on life and consciousness? It would seem more reasonable to dismiss for the 
present the semantic overtones of “who” and explore and exploit the insights to be 
won from the phrases, “communication” and “communication employed to establish 
meaning.” 


Fgllesdal’s statement supplies, not an answer, but the doorway to new questions. 
For example, man has not yet learned how to communicate with ant. When he 
does, will the questions put to the world around by the ant and the answers that he 
elicits contribute their share, too, to the establishment of meaning? As another issue 
associated with communication, we have yet to learn how to draw the line between 
a communication network that is closed, or parochial, and one that is open. And 
how to use that difference to distinguish between reality and poker — or another 
game [148, 149] — so intense as to appear more real than reality. No term in 
Fgllesdal’s statement posses greater challenge to reflection than “communication,” 
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descriptor of a domain of investigation [150-152] that enlarges in sophistication with 
each passing year. 


Fifth and final clue: More is different [153]. Not by plan but by inner necessity 
a sufficiently large number of H2O molecules collected in a box will manifest solid, 
liquid and gas phases. Phase changes, superfluidity and superconductivity all bear 
witness to Anderson’s pithy point, more is different. 


We donot have to turn to objects so material as electrons, atoms and molecules 
to see big numbers generating new features. The evolution from small to large has 
already in a few decades forced on the computer a structure (154, 155] reminiscent 
of biology by reason of its segregation of different activities into distinct organs. 
Distinct organs, too, the giant telecommunications system of today finds itself in- 
escapably evolving [151, 152]. Will we someday understand time and space and all 
the other features that distinguish physics — and existence itself — as the similarly 
self-generated organs of a self-synthesized information system [156-158]? 


19.5 Conclusion 


The spacetime continuum? Even continuum existence itself? Except as idealiza- 
tion neither the one entity nor the other can make any claim to be a primordial 
category in the description of Nature. It is wrong, moreover, to regard this or that 
physical quantity as sitting “out there” with this or that numerical value in default 
of question asked and answer obtained by way of appropriate observing device. 
The information thus solicited makes physics and comes in bits. The count of bits 
drowned in the dark night of a blackhole displays itself as horizon area, expressed 
in the language of Bekenstein number. The bit count of the cosmos, however it is 
figured, is ten raised to a very large power. So also is the number of elementary 
acts of observer-participancy over any time of the order of fifty billion years. And, 
except via those time-leaping quantum phenomena that we rate as elementary acts 
of observer-participancy, no way has ever offered itself to construct what we call 
“reality.” That’s why we take seriously the theme of it from bit. 


19.6 Agenda 
Intimidating though the problem of existence continues to be, the theme of it from 
bit breaks it down into six issues that invite exploration: 


One: Go beyond Wootters and determine what, if anything, has to be added to 
distinguishability and complementarity to obtain all of standard quantum theory. 


Two: Translate the quantum versions of string theory and of Einstein’s ge- 
ometrodynamics from the language of continuum to the language of bits. 


Three: Sharpen the concept of bit. Determine whether “an elementary quantum 
phenomenon brought to a close by an irreversible act of amplification” has at bottom 


322 JOHN ARCHIBALD WHEELER 


(1) the 0-or-1 sharpness of definition of bit number nineteen in a string of binary 
digits, or (2) the accordion property of a mathematical theorem, the length of which, 
that is, the number of supplementary lemmas contained in which, the analyst can 
stretch or shrink according to his convenience. 


Four. Survey one by one with an imaginative eye the powerful tools that math- 
ematics — including mathematical logic — has won and now offers to deal with 
theorems on a wholesale rather than a retail level, and for each such technique 
work out the transcription into the world of bits. Give special attention to one and 
another deductive axiomatic system which is able to refer to itself [159], one and 
another self-referential deductive system. 


Five. From the wheels-upon-wheels-upon-wheels evolution of computer pro- 
gramming dig out, systematize and display every feature that illuminates the level- 
upon-level-upon-level structure of physics. 


Siz: Capitalize on the findings and outlooks of information theory [160-163], 
algorithmic entropy [164], evolution of organisms [165-167] and pattern recogni- 
tion (168-175]. Search out every link each has with physics at the quantum level. 
Consider, for instance, the string of bits 1111111... and its representation as the 
sum of the two strings 1001110... and 0110001... Explore and exploit the connec- 
tion between this information-theoretic statement and the findings of theory and 
experiment on the correlation between the polarizations of the two photons emitted 
in the annihilation of singlet positronium [176] and in like Einstein-Podolsky-Rosen 
experiments [177]. Seek out, moreover, every realization in the realm of physics of 
the information-theoretic triangle inequality recently discovered by Zurek [178]. 


Finally: Deplore? No, celebrate the absence of a clean clear definition of the 
term “bit” as elementary unit in the establishment of meaning. We reject “that 
view of science which used to say, ‘Define your terms before you proceed.’ The 
truly creative nature of any forward step in human knowledge,” we know, “is such 
that theory, concept, law and method of measurement — forever inseparable — are 
born into the world in union [179].” If and when we learn how to combine bits in 
fantastically large numbers to obtain what we call existence, we will know better 
what we mean both by bit and by existence. 


A single question animates this report: Can we ever expect to understand ex- 
istence? Clues we have, and work to do, to make headway on that issue. Surely 
someday, we can believe, we will grasp the central idea of it all as so simple, so 
beautiful, so compelling that we will all say to each other, “Oh, how could it have 
been otherwise! How could we all have been so blind so long!” 
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Discussion 


A discussion followed: 


N. G. van Kampen: Did you mean to say that the observer influences the ob- 
served object? 


J. A. Wheeler. The observer does not influence the past. Instead, by his choice 
of question, he decides about what feature of the object he shall have the right to 
make a clear statement. 


J.P. Vigier. Two problems. 


1. The first is that the QSO raise lots of unsolved problems, i.e. — strange 
quantized No/log(1 + z) relation — correlation with galaxies (Arp) — angu- 
lar correlation with brightest nearby galaxies (Burbidge et al.) 


2. The second is that the idea (Einstein et al.) of the reality of fields has led 
(assuming that “particles” are field singularities) to the only known justifica- 
tion of the geodesic law. To contest it is to make the meaning of dynamical 
behaviour purely observer-dependent, i.e., to kill the reality of the physical 
world. 
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J.A. Wheeler: 


1. 


The book by Thorne and colleagues, “Black Holes: The Membrane Paradigm,” 
describes how a supermassive black hole, endowed via accretion with great 
angular momentum inside and an accretion disk outside, produces counter- 
directed jets and radiation of great power. I know no other mechanism able 
to produce quasars. 


. No one has discovered a way to get a particle of wave length A from point 


A through empty flat space to a point B at a great distance L without its 
undergoing on the way a transverse spread of the order VLA. This spread 
imposes an inescapable limitation on the classical concept of “worldline.” 


References 


[1] 


(2 
[3] 


—_— 


[4] 


[5] 


[6] 


J. Kepler (1571-1630): Harmonices Mundi, 5 books (1619). The appendix to Ke- 
pler’s Book 5 contains one side, the publications of the English physician and 
thinker Robert Fludd (1574-1637) the other side, of a great debate, analysed by 
Wolfgang Pauli [W. Pauli: “Der Einfluss archetypischer Vorstellungen auf die Bil- 
dung naturwissenschaftlicher Theorien bei Kepler” in Naturerklarung und Psyche 
(Rascher, Zrich, 1952) p.109-194; reprinted in Wolfgang Pauli: Collected Scientzfic 
Papers, eds. R. Kronig and V. F. Weisskopf (Interscience-Wiley, New York, 1964) 
Vol.1, p.1023]. Totally in contrast to Fludd’s concept of intervention from on high 
[Utriusque Cosmo Maiors scilicet et Minoris Metaphysica, Physica atque technica 
Historia, 1st ed. (Oppenheim, 1621)] was Kepler’s guiding principle, Ubi materza, 
1bi geometria — where there is matter, there is geometry. It was not directly from 
Kepler’s writings, however, that Newton learned of Kepler’s three great geometry- 
driven findings about the motions of the planets in space and in time, but from the 
distillation of Kepler offered by Thomas Streete (1622-1689), Astronomea Carolina: 
A New Theorte of the Celestzal Motions (London, 1661). 


I. Newton: Philosophiae naturalis principia mathematica, 1st ed. (London, 1687). 


A. Einstein: “Zur allgemeinen Relativittstheorie” Preuss. Akad. Wiss. Berlin, 
Sitzber (1915) p.799-801; also (1915) p. 832-839, 844-847; (1916) p.688-696 and 
(1917) p.142-152. 


J. A. Wheeler: Journey into Gravity and Spacetime (Scientific American Library, 
Freeman, New York, 1990), cited hereafter as JGST, offers a brief and accessible 
summary of Einstein’s 1915 and still standard geometrodynamics which capitalizes 
on Elie Cartan’s appreciation of the central idea of the theory: the boundary of a 
boundary is zero. 


J. G. Mendel: “Versuche iiber Pflanzenhybriden” Verhandlungen des Naturforschen- 
den Vereins in Briinn 4 (1866). 


C. R. Darwin (1809-1882): On the Origin of Species by Means of Natural Selection, 
or the Preservation of Favoured Races in the Struggle for Life (London, 1859). 


THE SEARCH FOR LINKS 325 


[7] J. D. Watson and F. H. C. Crick: “Molecular structure of nucleic acids: a structure 


(8) 


[9] 


[10] 


[11] 


[12] 


[13] 


[14] 


[15] 


[16] 


[17] 


for deoxyribose nucleic acid” Nature 171(1953) 737-738. 


M. Planck: “Zur Theorie des Gesetzes der Energieverteilung im Normalspektrum” 
Verhand. Deutschen Phys. Gesell. 2 (1900) 237-245. 


N. Bohr: “The quantum postulate and the recent development of atomic theory” 
Nature 121 (1928) 580-590. Reprinted in J. A. Wheeler and W. H. Zurek: Quantum 
Theory and Measurement (Princeton University Press, 1983) p.87-126, referred to 
hereafter as WZ. The mathematics of complementarity I have not been able to 
discover stated anywhere more sharply, more generally and earlier than in H. Wey]: 
Gruppentheorie und Quantenmechantk (Hirzel, Leipzig, 1928), in the statement that 


the totality of operators for all the physical quantities of the system in question form 
an irreducible set. 


N.Bohr: “Can quantum-mechanical description of physical reality be considered 
complete?” Phys. Rev. 48 (1935) 696-702 reprinted in WZ, p. 145-151. 


A.Binstein to J.J.Laub, 1908, undated, Einstein Archives; scheduled for publication 
in The Collected Papers of Albert Einstein, group of volumes on the Swiss years 1902- 


1914, Volume 5: Correspondence, 1902-1914 (Princeton University Press, Princeton, 
New Jersey). 


J.A.Wheeler: “Assessment of Everett’s “relative state” formulation of quantum the- 
ory” Rev.Mod.Phys. 29 (1957) 463-465. 


J.A.Wheeler: “On the nature of quantum geometrodynamics,” Ann. of Phys. 2 
(1957) 604-614. 


J. A. Wheeler: “Superspace and the nature of quantum geometrodynamics,” in 
Battelle Rencontres: 1967 Lectures in Mathematics and Physics, eds. C. M. DeWitt 
and J. A. Wheeler, (Benjamin, New York, 1968) p. 242-307; reprinted as “Le su- 
perspace et la nature de la géometrodynamique quantique,” in Fluides et Champ 
Gravitationnel en Relativité Genérale, No. 170, Colloques Internationausz (Editions 
de Centre National de la recherche Scientifique, Paris, 1969) p.257-322. 


J. A. Wheeler: “Transcending the law of conservation of leptons,” in Atti del Con- 
vegno Internazionale sul Tema: The Astrophysical Aspects of the Weak Interac- 
tions (Cortona “I! Palazzone,” 10-12 Giugno 1970), Accademia Nationale die Lincei, 
Quaderno N. 157 (1971) p.133-64. 


C. W. Misner, K. S. Thorne and J. A. Wheeler: Gravitation (Freeman, San Fran- 
cisco, now New York, 1973) p. 1217, cited hereafter as MTW;; paragraph on par- 
ticipatory concept of the universe. 


J. A. Wheeler: “The universe as home for man,” in The Nature of Scientific Discov- 
ery, ed. O. Gingerich (Smithsonian Institution Press, Washington, 1975) p. 261-296; 
preprinted in part in American Scientist, 62 (1974) 683-91; reprinted in part as T. 
P. Snow, The Dynamic Universe (West, St. Paul, Minnesota, 1983) p.108-109. 


326 


[18] 


[19] 


[20} 


[21] 


[22] 


[23] 


[24] 


[25] 


[26] 


[27] 


JOHN ARCHIBALD WHEELER 


C. M. Patton and J. A. Wheeler: “Is physics legislated by cosmogony?,” in Quantum 
Gravity, eds. C. Isham, R. Penrose and D. Sciama (Clarendon, Oxford, 1975) p. 
538-605; reprinted in part in Encyclopaedia of Ignorance, eds. R. Duncan and M. 
Weston-Smith (Pergamon, Oxford, 1977) p.19-35. 


J. A. Wheeler: “Include the observer in the wave function?” Fundamenta Scientiae: 
Seminaire sur les fondements des sciences (Strasbourg) 25 (1976) 9-35; reprinted in 
Quantum Mechanics A Half Century Later, eds. J. Leite Lopes and M. Paty (Reidel, 
Dordrecht, 1977) p.1-18. 


J. A. Wheeler: “Genesis and observership,” in Foundational Problems an the Special 
Sciences, eds. R. Butts and J. Hintikka (Reidel, Dordrecht, 1977) p.1-33. 


J. A. Wheeler: “The “past” and the “delayed choice” double-slit experiment,” in 
Mathematical Foundations of Quantum Theory, ed. A. R. Marlow (Academic, New 
York, 1978) p.9-48; reprinted in part in WZ, p.182-200. 


J. A. Wheeler: “Frontiers of time,” in Problems tn the Foundations of Phystcs, 
Proceedings of the International School of Physics “Enrico Fermi” (Course 72), ed. 
N. Toraldo di Francia (North-Holland, Amsterdam, 1979) p.395-497; reprinted in 
part in WZ, p. 200-208. 


J. A. Wheeler: “The quantum and the universe,” in Relativity, Quanta, and Cos- 
mology zn the Development of the Screntific Thought of Albert Einstezn, Vol. II, eds. 
M. Pantalco and F. deFinis (Johnson Reprint Corp., New York, 1979) p.807-825. 


J. A. Wheeler: “Beyond the black hole,” in Some Strangeness tn the Proportion: 
A Centennial Symposium to Celebrate the Achievements of Albert Einstein, ed. H. 
Woolf (Addison-Wesley, Reading, Massachusetts, 1980) p.341-375; reprinted in part 
in WZ, p.208-210. 


J. A. Wheeler: “Pregeometry: motivations and prospects,” in Quantum Theory and 
Gravitation, proceedings of a symposium held at Loyola University, New Orleans, 
May 23-26, 1979, ed. A. R. Marlow (Academic, New York, 1980) p.1-11. 


J. A. Wheeler: “Law without law,” in Structure in Sctence and Art, eds. P. Medawar 
and J. Shelley (Elsevier North-Holland, New York and Excerpta Medica, Amster- 
dam, 1980) p.132-54. 


J. A. Wheeler: “Delayed-choice experiments and the Bohr-EFinstein dialog,” in 
American Philosophical Society and the Royal Soctety: Papers read at a meeting, 
June 5, 1980 (American Philosophical Society, Philadelphia, 1980) p.9-40; reprinted 
in slightly abbreviated form and translated into German as “Die Experimente der 
verzogerten Entscheidung und der Dialog zwischen Bohr and Einstein,” in Moderne 
Naturphilosophie ed. B. Kanitscheider (Kénigshausen and Neumann, Wiirzburg, 
1984) p. 203-222; reprinted in Niels Bohr: A Profile, eds. A. N. Mitra, L. S. Kothari, 
V. Singh and S. K. Trehan (Indian National Science Academy, New Delhi, 1985) 
p.139-168. 


[28] 


[29] 


[30} 


[31] 


[32] 


[33] 
[34] 


[35] 
[36] 


[37] 


[38] 


[39] 


[40] 


THE SEARCH FOR LINKS 327 


J. A. Wheeler: “Not consciousness but the distinction between the probe and the 
probed as central to the elemental quantum act of observation,” in The Role of 
Conscrousness mm the Physical World, ed. R. G. Jahn (Westview, Boulder, 1981) p. 
87-111. 


J. A. Wheeler: “The elementary quantum act as higgledy-piggledy building mecha- 
nism,” in Quantum Theory and the Structures of Time and Space, Papers presented 
at a Conference held an Tutzng, July, 1980, eds. L. Castell and C. F. von Weizsacker 
(Carl Hanser, Munich, 1981) p.27-304. 


J. A. Wheeler: “The computer and the universe,” Int. J. Theo. Phys. 21 (1982) 
557-571. 


J. A. Wheeler: “Bohr, Einstein, and the strange lesson of the quantum,” in Mind an 
Nature, Nobel Conference XVII, Gustavus Adolphus College, St. Peter, Minnesota, 
ed. Richard Q. Elvee (Harper and Row, New York, 1982) p.1-30 (also 88, 112, 113, 
130-131, 148-40). 


J. A. Wheeler: Physics and Austerity (in Chinese) (Anhui Science and Technology 
Publications, Anhui, China, 1982); reprinted in part (Lecture II), in Krists, Vol.1, 
No.2, ed. I. Masculescu (Klinckscieck, Paris, 1983) p.671-75. 


J. A. Wheeler: “Particles and geometry,” in Unified Theories of Elementary Partt- 
cles, eds. P. Breitenlohner and H. P. Durr (Springer, Berlin, 1982) p.189-217. 


J. A. Wheeler: “Blackholes and new physics,” in Discovery: Research and Scholar- 
ship at the University of Texas at Austin, 7, No.2 (Winter 1982) 47. 


J. A. Wheeler: “On recognizing law without law,” Am. J. Phys. 51 (1983) 398-404. 


J. A. Wheeler: “Jenseits aller Zeitlichkeit,” in Dre Zeit, Schrifter der Carl] Friedrich 
von Seiemens-Stiftung, Vol. 6, eds. A. Peis] and A. Mohler (Oldenbourg, Mnchen, 
1983) p.17-34. 


J. A. Wheeler: “Elementary quantum phenomenon as building unit,” in Quantum 
Optics, Experimental Gravitation, and Measurement Theory, eds. P. Meystre and 
M. Scully (Plenum, New York and London, 1983) p.141-143. 


J. A. Wheeler: “Bits, quanta, meaning,” in Problems in Theoretical Physics, eds. 
A. Giovannini, F. Mancini and M. Marinaro (University of Salerno Press, Salerno, 
1984) p. 121-141; also in Theoretical Physics Meeting: Atti del Convegno, Amalfi, 
6-7 maggo 1983 (Edizioni Scientifiche Italiane, Naples, 1984) p. 121-134; also in 
Festschrift in Honour of Eduardo R. Cataniello, eds. A. Giovannini, F. Mancini, M. 
Marinaro, and A. Rimini (World Scientific, Singapore, 1989) p.133-154. 


J. A. Wheeler: “Quantum gravity: the question of measurement, ’ in Quantum 
Theory of Gravity, ed. S. M. Christensen (Hilger, Bristol 1984) p.224-233. 


W. A. Miller and J. A. Wheeler: “Delayed-choice experiments and Bohr’s elemen- 
tary quantum phenomenon,” in Proceedings of International Symposium of Foun- 
dations of Quantum Mechanics in the Light of New Technology, Tokyo, 1983, eds. 
S. Kamefuchi et al. (The Physical Society of Japan, Tokyo, 1984) p.140-151. 


328 


JOHN ARCHIBALD WHEELER 


[41] J. A. Wheeler: “Bohr’s ‘phenomenon’ and ‘law without law’ ” in Chaotic Behanor 


[42] 


(43] 


[44] 


[45] 


[46] 


[47] 


[48] 


[49] 


[50] 


51) 


[52] 


(53] 


[54] 


an Quantum Systems, ed. G. Casati (Plenum, New York, 1985) p.363-378. 


A. Kheyfets and J. A. Wheeler: “Boundary of a boundary principle and geometric 
structure of field theories,” Int. J. Theo. Phys. 25 (1986) 573-580. 


J. A. Wheeler: “Physics as meaning circuit: three problems,” in Frontters of Non- 
Equililibrium Statistical Physics, eds. G. T. Moore and M. 0. Scully (Plenum, New 
York, 1986) p.25-32. 


J. A. Wheeler: “Interview on the role of the observer in quantum mechanics,” in 
The Ghost in the Atom, eds. P.C.W. Davies and J. R.. Brown (Cambridge University 
Press, Cambridge, 1986) p.58-69. 


J. A. Wheeler: “How come the quantum,” in New Techniques and Ideas in Quantum 
Measurement Theory ed. D. M. Greenberger (Ann. New York Acad. Sci. 480 (1987) 
p.304-316.) 


J. A. Wheeler: “Hermann Wey] and the unity of knowledge,” in Exact Sciences and 
their Philosophical Foundations, eds. W. Deppert et al. (Lang, Frankfurt am Main, 
1988) p.469-503; appeared in abbreviated form in American Scientist 74 (1986) 
366-375. 


J. A. Wheeler: “World as system self-synthesized by quantum networking,” IBM 
J. of Res. and Dev. 32 (1988) 4-15; reprinted, in Probability in the Sciences, ed. E. 
Agazzi (Kluwer, Amsterdam, 1988) p.103-129. 


D. M. Greenberger, ed.: New Techniques and Ideas 1n Quantum Measurement The- 
ory (Annals of the New York Academy of Sciences, Vol.480 (1986)). 


B. d’Espagnat: Reality and the Physicist: Knowledge, Duration and the Quantum 
World (Cambridge University Press, Cambridge, 1989). 


P. Mittelstaedt and E. W. Stachow, eds: Recent Developments in Quantum Logic 
(Bibliographisches Institut, Zrich, 1985). 


J. S. Bell: Speakable and Unspeakable in Quantum Mechanics: Collected Papers in 
Quantum Mechanics (Cambridge University Press, Cambridge, 1987). 


J. W. Tukey: “Sequential conversion of continuous data to digital data,” Bell Lab- 
oratories memorandum of 1 September 1947 marks the introduction of the term 
“bit” reprinted in Origin of the term bit, ed. H. S. Tropp (Annals Hist. Computing 
6 (1984)152-155.) 


W. K. Wootters and W. H. Zurek: “A single quantum cannot be cloned,” Nature 
279 (1982) 802-803. 


W. K. Wootters and W. H. Zurek, “On replicating photons,” Nature, 304 (1983)188- 
189. 


THE SEARCH FOR LINKS 329 


(55) Aharonov and D. Bohm: “Significance of electromagnetic potentials in the quantum 
theory” Phys. Rev. 115 (1959) 485-491; J. D. Bekenstein: Baryon Number, Entropy, 
and Black Hole Physics, Ph.D. thesis, Princeton University (1972); photocopy avail- 
able from University Microfilms, Ann Arbor, Michigan. 


[56] J. Anandan: “Comment on geometric phase for classical field theories,” Phys. Rev. 
Lett. 60 (1988) 2555. 


[57] J. Anandan and Y. Aharonov: “Geometric quantum phase and angles,” Phys. Rev. 
D38 (1988) 1863-1870; includes references to the literature of the subject. 


[58] J. D. Bekenstein: “Black holes and the second law,” Nuovo Cimento Lett. 4 (1972) 
737-740. 


[59] J. D. Bekenstein: “Generalized second law of thermodynamics in black-hole 
physics,” Phys. Rev. D9 (1973) 3292-3300. 


[60] J. D. Bekenstein: “Black-hole thermodynamics” Physics Today 33 (1980) 24-31. 


[61] R. Penrose: “Gravitational collapse: the role of general relativity,” Riv. Nuovo 
Cimento 1 (1969) 252-276. 


[62] D. Christodoulou: “Reversible and irreversible transformations in black-hole 
physics,” Phys. Rev. Lett. 25 (1970) 1596-1597. 


[63] D. Christodoulou and R. Ruffini: “Reversible transformations of a charged black 
hole,” Phys. Rev. D4 (1971) 3552-3555. 


[64] S. W. Hawking: “Particle creation by black holes,” Commun. Math. Phys. 43 (1975) 
199-220. 


[65] S. W. Hawking: “Black holes and thermodynamics,” Phys. Rev. 13 (1976) 191-197. 


[66] W. H. Zurek and K. S. Thorne: “Statistical mechanical origin of the entropy of a 
rotating, charged black hole,” Phys. Rev. Lett. 20 (1985) 2171-2175. 


[67] MTW, p.1217. 
[68] W. Shakespeare: The Tempest, Act IV, Scene 1, lines 148 ff. 


[69] T. Mann: Freud, Goethe, Wagner (New York, 1937) p.20; trans. by H. T. Lowe- 
Porter from Freud und die Zukunft (Vienna, 1936). 


[70] G. W. Leibniz as cited in J. R. Newman: The World of Mathematics (Simon and 
Schuster, New York, 1956). 


[71] N. E. Steenrod: Cohomology Operations (Princeton University Press, Princeton, 
New Jersey, 1962). 


[72] C. Bhresmann: Catégortes et Structures (Dunod, Paris, 1965). 


330 JOHN ARCHIBALD WHEELER. 


[73] D. Lohmer: Phanomenologie der Mathematik: Elemente einer Phanomenologtschen 
Aufklarung der Mathematischen Erkenntnis nach Husserl (Kluwer, Norwell, Mas- 
sachusetts, 1989). 


[74] A. Weil: “De la metaphysique aux mathematiques,” Sczences, p.52-56; reprinted in 
A. Weil: QOuevres Scientifiques: Collected Works, Vol. 2, 1951-64 (Springer, New 
York, 1979) p.408-412. 


[75] J. D. Barrow and F. J. Tipler: The Anthropic Cosmological Principle (Oxford Uni- 
versity Press, New York, 1986) and literature therein cited. 


[76] See for example the survey by F. Feferman: “Turing in the land of O(z),” and related 
papers on mathematical] logic, in R. Herken: The Universal Turing Machine: A Half 
Century Survey, (Kammerer and Unverzagt, Hamburg and Oxford University Press, 
New York, 1988) p.113-147. 


[77] H. Weyl: “Mathematics and logic.” A brief survey serving as a preface to a review 
of The Philosophy of Bertrand Russell, Am. Math. Monthly 53 (1946) 2-13. 


[78] W. V. O. Quine: p.18 in the essay “On what there is,” in From a Logical Point of 
View, 2nd ed. (Harvard University Press, Cambridge, Massachusetts, 1980) p.1-19. 


[79] Discovered among graffiti in the men’s room of the Pecan Street Cafe, Austin, Texas. 


[80] G. W. Leibniz: Animadversiones ad Joh. George Wachteri librum de recondita He- 
braeorum philosophia, c. 1708, unpublished; English translation in P. P. Wiener, 
Leibniz Selections (Scribners, New York, 1951) p.488. 


(81] A. Einstein: as quoted by A. Forsee in Albert Einstezn Theoretical Physicist (Macmil- 
lan, New York, 1963) p.81. 


[82] MTW, 43.4. 
[83] Ref. 22, p.411. 
(84] E.H. Spanier: Algebraic Topology (McGraw-Hill, New York, 1966). 


[85] B. Cartan: La Geometrie des Espaces de Riemann, Memortal des Sciences Mathe- 
matiques (GauthierVillars, Paris, 1925). 


[86] B. Cartan: Lecons sur la Geometrie des Espaces de Riemann (Gauthier-Villars, 
Paris, 1925). 


[87] MTW, Chap. 15. 
[88] M. Atiyah: Collected Papers. Vol. 5: Gauge Theories (Clarendon, Oxford, 1988). 
[89] Ref. 21, p.41-42; ref. 22, p.397-398. 


[90] W. K. Wootters and W. H. Zurek: “Complementarity in the double-slit experiment: 
quantum nonseparability and a quantitative statement of Bohr’s principle,” Phys. 
Rev. D19 (1979) 473-484. 


[91] 


[92] 


[93] 
[94] 
[95} 


[96} 
[97] 


[98} 
[99] 


[100] 


THE SEARCH FOR LINKS 331 


W. Heisenberg: “Uber den anschaulichen Inhalt der quantentheoretischen Kine- 
matik und Mechanik,” Zeits. f. Physik 43 (1927) 172-198; English translation in 
WZ, p.62-84. 


N. Bohr and L. Rosenfeld: “Zur Frage der Messbarkeit der elektromagnetischen 
Feldgréssen ” Mat.-fys. Medd. Dan. Vid. Selsk. 12, no.8 (1933); English translation 
by Aage Petersen, 1979, reprinted in WZ, 479-534. 


D. J. Gross: “On the calculation of the fine-structure constant,” Phys. Today 42, 
No.12 (1989). 


N. F. Mott: “The wave mechanics of a-ray tracks,” Proc. Roy. Soc. London A126 
(1929) 74-84; reprinted in WZ, p.129-134. 


H. D. Zeh: “On the interpretation of measurement in quantum theory,” Found. 
Phys. 1 (1970) 69-76. 


H. D. Zeh: The Physical Basis of the Direction of Time (Springer, Berlin, 1989). 


E. Joos and H. D. Zeh: “The emergence of classical properties through interaction 
with the environment,” Zeits. f. Physik B59 (1985) 223-243. 


W. H. Zurek: “Pointer basis of quantum apparatus: Into what mixture does the 
wavepacket collapse?,” Phys. Rev. D24 (1981) 1516-1525. 


W. H. Zurek: “Environment-induced superselection rules,” Phys. Rev. D26 (1982) 
1862-1880. 


W.H. Zurek: “Information transfer in quantum measurements: irreversibility and 
amplification,” in Quantum Optics, Experimental Gravitation and Measurement 
Theory, eds. P. Meystre and M. O. Scully (Plenum, New York, 1983) p.87-116. 


[101] W. G. Unruh and W. H. Zurek: “Reduction of a wave packet in quantum Brownian 


[102] 


[103] 


[104] 


[105] 


(106] 
[107] 


motion,” Phys. Rev. D40 (1989) 1071-1094. 


J. B. Hartle: “Progress in quantum cosmology,” preprint from Physics Department, 
University of California at Santa Barbara, 1989. 


M. B. Green, J. H. Schwarz and EB. Witten: Superstring Theory (Cambridge Uni- 
versity Press, Cambridge, U.K., 1987). 


L. Brink and M. Henneaux: Principles of String Theory: Studies of the Centro de 
Estudios Cientificos de Santiago (Plenum, New York, 1988). 


S. W. Hawking: “The Boundary Conditions of the Universe,” in Astrophysical Cos- 
mology, Pontificia Academic Scientiarum, eds. H. A. Briick, G. V. Coyne and M. S. 
Longair (Vatican City, 1982) p. 563-594. 


A. Vilenkin: “Creation of universes from nothing,” Phys. Lett. B117 (1982) 25-28. 


J. B. Hartle and S. W. Hawking: “Wave function of the universe,” Phys. Rev. D28 
(1983) 2960-2975. 


332 


[108] 


[109] 


[110] 


JOHN ARCHIBALD WHEELER 


W. K. Wootters: “The acquisition of information from quantum measurements,” 
Ph.D. dissertation, University of Texas at Austin (1980). 


W. K. Wootters: “Statistical distribution and Hilbert space,” Phys. Rev. 23 (1981) 
357-362. 


R. A. Fisher: “On the dominance ratio,” Proc. Roy. Soc. Edin. 42 (1922) 321-341. 


[111] R.A. Fisher: Statistical Methods and Statistical Inference (Hafner, New York, 1956) 


[112] 


(113) 


(114] 


p.8-17. 


E. C. G. Stueckelberg: “Theoreme H et unitarite de S,” Helv. Phys. Acta 25 (1952) 
577-580. 


E. C. G. Stueckelberg: “Quantum theory in rea] Hilbert space,” Helv. Phys. Acta 
33 (1960) 727-752. 


D. S. Saxon: Elementary Quantum Mechamcs (Holden, San Francisco 1964). 


[115] H. J. Larson: Introduction to Probability Theory and Statistical Inference, 2nd ed. 


[116] 


(117] 


[113] 


[119] 


[120] 


[121] 


[122] 


[123] 


[124] 


(Wiley, New York, 1974). 


E. Schrédinger: “The Foundation of the Theory of Probability,” Proc. Roy. Irish 
Acad. 51A (1947) 51-66 and 141-146. 


E. T. Jaynes: “Bayesian methods: Genera] background,” in Mazimum Entropy and 
Bayesian Methods in Applied Statistecs, ed. J. H. Justice (Cambridge University 
Press, Cambridge, U.K., 1986) p.1-25. 


R. Viertl, ed.: Probability and Bayesian Statistics (World Scientific, Singapore, 
1987). 


R. D. Rosenkrantz, ed.: E. T. Jaynes: Papers on Probability, Statestacs and Statts- 
tical Physics (Reidel-Kluwer, Hingham, Massachusetts, 1989). 


P. J. Denning: “Bayesian learning,” American Sci.77 (1989) 216-218. 


J. O. Berger and D. A. Berry: “Statistical analysis and the illusion of objectivity,” 
American Sci. 76 (1988) 159-165. 


J. Burke: The Day the Universe Changed (Little, Brown, Boston, Massachusetts, 
1985). 


F. Beck [pseudonym of the early nuclear-reaction-rate theorist Fritz Houtermans| 
and W. Godin: translated from the German original by E. Mosbacher and D. Porter, 
Russian Purge and The Eztraction of Confessions (Hurst and Blackett, London, 
1951). 


D. Fgllesdal: “Meaning and experience,” in Mind and Language, ed. S. Guttenplan 
(Clarendon, Oxford, 1975) p. 25-44. 


[125] 


[126] 
(127) 
(128] 


[1.29] 
(130] 
(131] 
[132} 
[133] 
(134] 
[135] 
(136] 
[137] 


(138] 
[139] 


(140] 
(141] 
(142] 


[143] 


THE SEARCH FOR LINKS 333 


G. Berkeley: Treatise Concerning the Principles of Understanding, Dublin (1710; 
2nd ed. 1734); re his reasoning that “No object exists apart from mind,” cf. article 
Berkeley by R. Adamson: Encyclopedia Brittanica, Chicago 3 (1959) 438. 


T. Segerstedt: as quoted in ref. 22, p.415. 
Ref. 21, p.41. 


Ya. B. Zel’dovich and I. D. Novikov: Relativistic Astrophysics, Vol. 1: Stars and 
Relativity (University of Chicago Press, Chicago, 1971). 


J. Mather et al.: “A preliminary measurement of the cosmic microwave background 
spectrum by the Cosmic Background Explorer (COBE) Satellite,” submitted for 
publication, Astrophys. J. Lett. (1990). 


MTW,, p.738, Box 27.4; or JGST, Chap. 13, p.242. 


G. K. O'Neill: The High Frontier, 4th ed. (Space Studies Institute, Princeton, New 
Jersey, 1989). 


R. Jastrow: Journey to the Stars: Space Exploration — Tomorrow and Beyond 
(Bantam, New York, 1989). 


K. Popper: Conjectures and Refutattons: the Growth of Scientific Knowledge (Basic 
Books, New York, 1962). 


R. W. Fuller and P. Putnam: “On the origin of order in behavior,” General Systems 
(Ann Arbor, Michigan) 12 (1966)111-121. 


R. W. Fuller: “Causal and Moral Law: Their Relationship as Examined in Terms 
of a Model of the Brain,” Monday Evening Papers (Wesleyan University Press, 
Middletown, Connecticut, 1967). 


G.M. Edelman: Neural Darwinism (Basic Books, New York, 1987). 
W. H. Calvin: The Cerebral Symphony (Bantam, New York, 1990). 
W. W. Collins: The Moonstone (London, 1968). 


J. Allan Hobson: Sleep (Scientific American Library, Freeman, New York, 1989) 
p.86, 89, 175, 185, 186. 


I. Langmuir: “Pathological Science,” 1953 colloquium, transcribed and edited, Phys. 
Today, 42, No.12 (1989) 36-48. 


N. S. Hetherington: Science and Objectivity: Episodes in the History of Astronomy 
(Iowa State University Press, Ames, Iowa, 1988). 


W. Sheehan: Planets and Perception: Telescoptc Views and Interpretations (Uni- 
versity of Arizona Press, Tucson, Arizona, 1988). 


M. White: Sczence and Sentiment in America: Philosophical Thought from Jonathan 
Edwards to John Dewey (Oxford University Press, New York, 1972). 


334 


[144] 


(145] 


[146] 


(147] 


(148] 


[149] 
[150] 


(151) 
(152] 


[153] 
(154) 


(155) 
[156] 


(157] 


[158} 


JOHN ARCHIBALD WHEELER 


C. S. Peirce: The Philosophy of Peirce: Selected Writings, ed. J. Buchler (Rout- 
ledge and Kegan Paul, London, 1940), passages from p.337, 335, 336, 353 and 358; 
reprinted in ref. 18, p.593-595. Peirce’s position on the forces of Nature, “May they 
not have naturally grown up,” foreshadow though it does the concept of world as 
self-synthesized system, differs from it in one decisive point, in that it tacitly takes 
time as primordial category supplied free of charge from outside. 


Parmenides of Flea [c. 515 B.C.-450 B.C.], poem Nature, part Truth, as summarized 
by A. C. Lloyd in article Parmenides, Encyclopedia Brittanica, Chicago 17 (1959) 
327. 


G.E. Pugh: On the Origin of Human Values (New York, 1976); chapter Human 
values, free will, and the conscious mind preprinted in Zygon 11 (1976) 2-24. 


F. W. 3. von Schelling [1775-1854]: in Schellings Werke, nach der Originalausgabe 
an neuer Anordnung herausgegeben, 6 vols., ed. M. Schréter, (Beck, Mtinchen, 1958- 
1959), esp. Vol.5, p.428-430, as kindly summarized for me by B. Kanitscheider: “dass 
das Universum von vorn-herein ein ihm immanentes Ziel, eine teleologische Struk- 
tur, besitzt und in allen seinen Produkten auf Evolutionare Stadien ausgerichtet 
ist, die schliesslich die Hervoybringung von Selbstbewusstsein einschliessen, welches 
dann aber wiederum den Entstehungsprozess reflektiert und diese Reflexion ist die 
notwendige Bedingung fiir die Konstitution der Gegenstande des Bewusstseins.” 


J. von Neumann and O. Morgenstern: Theory of Games and Economic Behavior 
(Princeton University Press, Princeton, New Jersey, 1944). 


J. Wang: Theory of Games (Oxford University Press, New York, 1988). 


J. R. Pierce: Symbols, Signals and Noise: The Nature and Process of Communica- 
tion (Harper and Brothers, New York, 1961). 


M. Schwartz: Telecommunication Networks: Protocols, Modeling and Analysts 
(Addison-Wesley, Reading, Massachusetts, 1987). 


M. S. Roden: Digital Communication Systems Design (Prentice Hall, Englewood, 
Cliffs, New Jersey, 1988). 


P. W. Anderson: “More is different,” Science 177 (1972) 393-396. 


C. Mead and L. Conway: Introduction to VLSI [very large-scale integrated-circuit 
design] Systems (Addison-Wesley, Reading, Massachusetts, 1980). 


P. B. Schneck: Supercomputer Architecture (Kluwer, Norwell, Massachusetts, 1987). 


F. E. Yates, ed.: Self-Organizing Systems: The Emergence of Order (Plenum, New 
York, 1987). 


H. Haken: Information and Self-Organization: A Macroscopic Approach to Complex 
Systems (Springer, Berlin, 1988). 


T. Kohonen: Self-Organization and Associative Memory, 3rd ed. (Springer, New 
York, 1989). 


[159] 
[160] 


[161] 


(162] 


[163] 


[164] 


(165] 


THE SEARCH FOR LINKS 335 


C. Smorynski: Self-reference and Model Logic (Springer, Berlin, 1985). 


G. J. Chaitin: Algorithmic Information Theory, rev. 1987 ed., (Cambridge Univer- 
sity, Cambridge, 1988). 


J. P. Delahaye: “Chaitin’s equation; an extension of Gédel’s theorem,” Notices 
Amer. Math. Soc. 36 (1989) 948-987. 


J. F. Traub, G. W. Wasilkowski and H. Woznaikowski: Information-Based Com- 
plexity (Academic, San Diego, 1988). 


P. Young: The Nature of Information (Praeger Greenwood, Westport, Connecticut, 
1987). 


W. H. Zurek: “Algorithmic randomness and physical entropy,” Phys. Rev. A40 
(1989) 4731-4751. 


M. Eigen and R. Winkler: Das Spiel: Naturgesetze steuern den Zufall (Piper, 
Miinchen, 1975). 


[166] W.M. Elsasser: Reflectsons on a Theory of Organisms (Orbis, Frelighsburg, Quebec, 


[167] 


[168] 
[169] 


[170] 
[171] 
[172] 
[173] 
[174] 
[175] 


[176] 
[177] 


1987). 


G. Nicols and I. Prigogine: Exploring Complezity: An Introduction (Freeman, New 
York, 1989). 


S. Watanabe, ed.: Methodolgies of Pattern Recognition (Academic, New York, 1967). 


J. Tou and R. C. Gonzalez: Pattern Recognition Principles (Addison-Wesley, Read- 
ing, Massachusetts, 1974). 


H. Haken, ed.: Pattern Formation by Dynamic Systems and Pattern Recognition 
(Springer, Berlin, 1979). 


H. Small and E. Garfield: “The geography of science: disciplinary and national 
mappings,” J. of Info. Sci. 11 (1985) 147-159. 


M. Agu: “Field theory of pattern recognition,” Phys. Rev. A37 (1988) 4415-4418. 


M. Minsky and S. Papert: Perceptrons: An Introduction to Computational Ge- 
ometry, 2nd ed. (Massachusetts Institute of Technology Press, Cambridge, Mas- 
sachusetts, 1988). 


L. A. Steen: “The science of patterns,” Science 240 (1988) 611-616. 


B. M. Bennett, D. D. Hoffman and C. Prakash: Observer Mechanics: A Formal 
Theory of Perception (Academic, San Diego, California, 1989). 


J. A. Wheeler: “Polyelectrons,” Ann. New York Acad. Sci. 46 (1946) 219-238. 


D. Bohm: “The paradox of Einstein, Rosen and Podolsky,” originally published as 
section 15-19, Chapter 22 of D. Bohm: Quantum Theory (Prentice-Hall, Englewood, 
Cliffs, N.J., 1950), reprinted in WZ, 356-368. 


336 JOHN ARCHIBALD WHEELER 


[178] W. H. Zurek: “Thermodynamic cost of computation: Algorithmic complexity and 
the information metric,” Nature 34 (1989)119-124. 


[179] E. F. Taylor and J. A. Wheeler: Spacetime Physics (Freeman, San Francisco, 1963) 
p.102. 


20 


FEYNMAN, BARTON AND THE REVERSIBLE 
SCHRODINGER DIFFERENCE EQUATION 


Ed Fredkin 


Ed Barton, then an MIT undergraduate, had agreed to work for me. I had had 
in mind a summer project and was very pleased that Ed was willing to accept the 
position. My goal was to solve the problem of creating discrete, reversible systems 
of various equations of physics. What this meant was that a computer programmed 
to compute the dynamics of such a system would be able to run forward any number 
of steps (repeatedly computing the future state from the present state) and then 
to be able to exactly retrace its steps running backwards (repeatedly computing 
the past state from the present state). The entire problem lay in the one word 
“exactly”. Almost any programmed model of any physical system could be made 
to reverse its course in time, but such systems never retraced their steps exactly! 
I felt that I had made really good progress recently and the time was at hand to 
push for a breakthrough; finding powerful, general methods for creating esthetically 
pleasing exactly reversible equations in most every area of dynamical systems. 


To understand what the problem was you need to understand how we then 
thought about calculations in general. Most workers felt that computers simply 
could not compute things in an exactly reversible fashion without saving the all the 
intermediate steps. Rollo Silver had come up with a method (what we would call a 
“hack” ) that could compute exactly backwards by saving only the initial conditions 
and the number of steps. (After going forwards n steps, it could go backwards one 
step by returning to the beginning and then going forwards n — 1 steps. It could 
continue going back to the beginning and then going forwards n — 2 steps, n — 3 
steps...) Toffoli had figured out how to create a reversible cellular automaton by 
saving every step in an extra dimension! Both ideas were important but both were 
esthetically obnoxious. It was 1975 and I had recently returned to MIT after a year 
spent at Caltech. 


I had been invited to Caltech, as a Fairchild Distinguished Scholar to work with 
Richard Feynman. We had a deal; he was to teach me quantum mechanics (from 
his perspective) and I was to teach him computer science (from my perspective). At 
the end of the year he complained to me that I got the better of the deal and J had 
to agree but the margin was close. It wasn’t from a lack of trying on my part; it was 
very hard to teach Feynman something because he didn’t want to let anyone teach 
him anything. What Feynman always wanted was to be told afew hints as to what 
the problem was and then to figure it out for himself. When you tried to save him 
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time by just telling him what he needed to know, he got angry because you would 
be depriving him of the satisfaction of discovering it for himself. I'll never forget 
what happened when I showed him one of the first HP-35 calculators. This was the 
first really complicated calculator and it used RPN (reverse polish notation) instead 
of standard algebraic notation like the TI calculators did. I showed it to Feynman 
and he was delighted. He grabbed it but he wouldn’t allow me to tell him anything 
about it! It took him many hours of fiddling to figure out how to use most of its 
features and having cracked its code he was through with it. 


As for me, I wanted Feynman to teach me. My trouble was that I was pretty 
sure of what I wanted to learn and what I didn’t want to learn. I was also somewhat 
hardheaded and too ready to argue. Feynman was wonderful about not bothering 
me with what I didn’t need. He assured me that for my purposes everything I 
needed was to be found in the Feynman Lectures. I would read something, ask 
questions and Feynman would answer my questions. It was great fun as I tried to 
guide him. 


“Let’s make things simple and first look at the one particle case,” I might sug- 
gest. Feynman would get fierce. “You'll never get it if you keep trying to look 
at what happens with just one particle.” I would argue for a bit and then real- 
ize why Feynman was right. What was so wonderful was that he never gave up 
on me and always persisted to make sure that I got the basic knowledge without 


misconceptions. 


While at Caltech my assignment to myself other than learning Quantum Me- 
chanics (QM) was to solve the problem of reversible computation. It was a won- 
derful year and I learned much about QM and I invented Conservative Logic and 
the so-called “Fredkin Gate”. Teaching me QM was almost as hard as teaching 
Feynman about computers. However, Feynman was able to cope with my foibles, 
he was persistent and fierce. We had talks, discussions and raging arguments; but 
it was always Feynman who was raging. Most memorable to me was once when we 
continued my physics lesson into lunchtime, walking to the greasy spoon cafeteria 
on campus. It was about an important issue and I was quietly refusing to go along 
with Feynman’s position. He kept his cool until we had just about finished eating 
and then he couldn’t contain his annoyance with me. Suddenly he stood up, so 
as to tower over me (I was still sitting and eating the last few bites of lunch). He 
started screaming at the top of his lungs about how stupid I was and why I had to 
look at it his way etc. I could only smile because I thought the scene was so funny 
and I was complimented that he cared so much, and anyway, I was sure that I was 
right. 

By this time we had attracted the attention of almost everyone in the restaurant. 
It was the kind of thing that Feynman loved. People more than a few tables away 
had started to stand up to get a better view. While it’s not true that Feynman was 
foaming at the mouth, nevertheless bits of saliva were landing on me and my plate. 
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Even so, it wasn’t enough for Feynman; he wasn’t towering over me enough. He 
suddenly looked at his chair, grabbed it and repositioned it so he could stand up 
on the chair and maybe climb up on the table so as to have a better platform with 
which to shout down at me. J got alarmed because I always worried that, physically, 
to some extent, Feynman was fairly fragile. I jumped up grabbed his arm and said, 
“Let’s go.” He hesitated for a moment, a bit annoyed at the prospect of losing the 
opportunity to do something so dramatic. He thought for a second more and then 
he did a peculiarly Feynman sort of thing. He continued beating on me verbally 
but the volume returned to normal. However each word was spoken with extreme 
emphasis, exaggerated stress and stretched out S’s that had him hissing like a snake. 
The various spectators returned to their seats as he continued to wind down; we 
left the restaurant and walked back to our offices in the Lauritsen building. 


The subject of the argument was my insistence on my approach to attacking the 
problem of creating discrete and reversible models of physical processes as opposed 
to Feynman’s counter proposal. I wanted a general method that started by dealing 
with the simplest kinds of problems such as point masses following Newton’s Laws, 
and Feynman insisted that I do nothing other than start with the Quantum Me- 
chanical description of physics. My approach was heuristic; though I might have 
been wrong, I was sure that reversibility was so all encompassing a principle as to 
apply to every concept of dynamics. I wanted to understand how it dealt with the 
simplest such systems before looking at QM and Feynman insisted that QM was all 
there was and | should not think about anything else. The goals were exactly the 
same. What Feynman couldn’t do, however, was to give me any help as to how to 
start with his approach. 


One day towards the end of the year, we were in Feynman’ s office having another 
of the usual kinds of arguments when he suddenly got exasperated. He jumped up, 
went to the blackboard and started posing QM problems and asking me for answers. 
After a while he suddenly stopped the quiz, broke into a big smile and said, “The 
trouble with you is not that you don’t understand quantum mechanics.” From 
Feynman it was the highest form of compliment. 


After I returned to MIT, I and my students made rapid and remarkable progress. 
With hindsight the solutions that we eventually found were simple and clear (e.g. 
reversible difference equations, the Billiard Ball Model, the Margolus CA rule, etc.). 
As is common in such circumstances the problem was not just to find the solutions, 
rather it was also to figure out what the right questions were. It also turned out that 
achieving exact reversibility for computational models of physics was equally easy 
for both Newtonian physics and Quantum Mechanics. Feynman was not wrong; it 
was just that my intuition functioned better with Newtonian Physics. 


So, now that Barton had agreed to work for me and given how bright he was | 
had to plan out enough work to keep him busy for the summer. I invited Barton 
to my Office. 
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I liked my office. It was in Tech Square, a modern 9 story building that housed 
MIT’s Laboratory for Computer Science and a few other tenants. It was like ; 
block from the main campus. I had done two unusual things to my office. I had put 
a wall to wall rug on the floor and had obtained an actual blackboard. Other than 
that it was similarly furnished to other Tech Square offices. When Barton arrived I 
told him that I wanted to make up a list of the problems we wanted to work on and 
try to solve over the summer. I had by then already discovered a few ad hoc ways of 
creating discrete, reversible equations for various systems and I now wanted to do 
more. I wanted to find powerful general methods and now, at last, attack physics 
from the QM perspective. I went to the blackboard and started to write down the 
various things we would work on. And then I got to the Schrodinger Equation; a 
wonderful differential equation that basically describes how amplitudes change over 
space and time. 


In mathematics we speak of a derivative. It’s not too complicated. An example is 
velocity, which is the first derivative of position with respect to time. “Going north 
at 60 Miles Per Hour” expresses the notion of velocity, how position will change with 
respect to time. “Zero to 60 in 6 seconds” expresses the notion of acceleration, how 
speed will change with time. We say that velocity is the first derivative of position 
with respect to time and that acceleration is the second derivative of position with 
respect to time. 


Derivatives are not always with respect to time. As we ascend in an airplane, 
we may notice that the air pressure (which is about 15 PSI at sea level) decreases 
about ; PSI per 1,000 feet. The first derivative of pressure with respect to altitude 
is -$ PSI per 1,000 feet at sea level. However that seems to imply that the pressure 
would get to zero by 30,000 feet. It turns out that the rate of decrease changes with 
altitude! A better rough approximation is that the pressure drops in half for every 
18,000-foot increase in altitude. The second derivative of pressure with respect to 
altitude is about how the rate of change (the first derivative) changes with altitude. 
In this case, the rate of change also drops in half about every 18,000 feet. This 
means that at 36,000 feet the pressure is about 3.75PSI and the rate of change is 
about -¢ PSI per 1,000 feet. 


Derivatives are creatures of the Calculus. They assume that the quantities in- 
volved can vary continuously. In a computer all the numbers are basically integers. 
Computers are discrete systems that are the antithesis of continuous systems. In- 
stead of Differential Equations we have Difference Equations. Instead of an equation 
that can tell us the values at any point, we have equations that can tell us the val- 
ues at a particular set of discrete points. It’s like listening to a clock that ticks 
every second and is silent the rest of the time. It’s how computers work, doing one 
discrete step at a time. 


I had told Feynman that I wanted to work with a discrete version of a particular 
form of the Schrodinger equation and I wanted some help on how to derive it. He 
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suggested that I take a look at the Feynman Lectures, Volume III on Propagation 
in a Crystal Lattice. I did and I found a very suggestive version of the Hamiltonian 
(13.3 and 13.4) but it didn’t all jell in my mind. 


ih (t) = ByCu(t) — ACn41 (8) — ACni (20.1) 


Feynman Lectures Vol III Eq 13.3 


I read on and discovered more of what I wanted in Chapter 16. (Page 16-4, 
equation 16.10). I was pretty good at seeing a difference equation when looking at 
a differential equation so J fixated on (16.13) as my starting point. 


in (on) = (Ep — 2A)C(2n) + A[2C (tn) — C(an + 6) — C(2n — 6) (20.2) 


Feynman Lectures Vol III Eq 16.10 


The Schrodinger Equation involves both a first derivative with respect to time 
and a second derivative with respect to space. To keep it simple I wanted to start 
with the one-dimensional form of the differential equation. 


dC, _ _R 0?C(z) 
Im Ox? 


(20.3) 


Free space one dimensional differential version of the Schrodinger Equation 


Writing down the difference equation was easy but annoying. The reason was 
that the right side (the second derivative with respect to space) came out beautifully 
symmetrical, but the left side (the first derivative with respect to time) did not. 


Cz,t+1 — Cz,t a tk(Cz-1, — 2Cz,4 + Cz+1,t) (20.4) 
Asymmetric difference version of the Schrodinger Equation 


My esthetic sense was not pleased. Well, that was the starting point. I sort of 
lectured Barton on the equation (though he had taken such courses at MIT and in 
some ways knew more than me). I had written down about 10 research items on 
the blackboard. I went on talking and completed my discussion of the task list for 
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the summer project. I told Barton that the list was merely a wish list and that I 
didn’t dream of getting everything done. 


Here we must digress for you to understand what was happening. I wanted 
to present Barton with the best starting point in what was to be his quest for the 
summer. He was to do aseries of projects with the keystone being to find a beautiful, 
discrete and reversible version of the Schrédinger Equation. I needed to get him 
started down the right path and also needed to feel that he really understood what I 
was looking for. When you are searching for the right equation you often are guided 
by a sense of esthetics; some equations are ugly and some are beautiful. Almost 
anyone other than me would have no trouble writing down the first difference with 
respect to time; it’s just the difference between the values at 2 points in time. 
If the Pressure, P, is 15 PSI at sea level (Pp = 15) and 14.5 PSI at 1,000 feet 
(Pi,oo0 = 14.5) then the first difference would be P1000 — Po = —3 PSI per 1,000 
feet. The trouble was the lack of symmetry; at what altitude is the first difference 
exactly —3 ? 


When I finished writing down the difference equation, I was fixated, staring 
at the blackboard. The chalk marks on my wonderful blackboard glared back. I 
couldn’t send Barton off on his task with that stupid asymmetrical first difference. 
Suddenly I had an inspiration. I erased the left side of the equation and replaced it 
with a perfectly symmetrical version of the first difference. 


Cort Cee = ik(Cz-1,2 — 2Cz,4 + C411) (20.5) 


Symmetric difference version of the Schrodinger Equation 


I was pleased and proud of myself (despite the basic triviality of what I had 
done). It was the kind of thing that must have been done by others a thousand 
times, but it was what erased my sense of unease and I felt good about it. I then 
finished lecturing Barton on what was to be done with the equation. He was to 
search for some form that computed approximately the same thing, but that met 
the criteria necessary for being implemented on a computer in an exactly reversible 
way. I had given Barton a good start and I felt sure that his work was cut out for 
him. He would have a busy summer and, I hoped, a productive one. Barton was 
sitting in a chair near the door to my office while I lectured, and when I was done 
he remained seated staring at the blackboard. I went to my desk and started to 
busy myself with other stuff, letting Barton think while I waited to see if he had 
any questions. 


After a long silence with Barton continuously staring at the Schrédinger Equa- 
tion, I said “What we have to do is to explore all the different forms that the 
Schrédinger Equation can take until we find one that is reversible.” Barton re- 
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mained silent and continued staring at the equation. I was getting somewhat dis- 
concerted. I thought that Barton was very bright but he seemed to not get the point. 
“Barton,” I said getting more annoyed, “we need a version that’s reversible!” 


Barton ignored me and kept staring at the board. “It is,” he said, quietly. And, 
after another long pause he again said, “It is.” 


Now I was really exasperated. “Look Barton, you don’t get it. I’ve written down 
a simple symmetric version of the Schrodinger Equation and what I want you to do 
is to spend the summer looking for a reversible version of it. Start with what’s on 
the board and try to make a version that’s reversible.” 


Barton still kept staring at the board and Sphinx-like said again, “It is.” 


“What do you mean by ‘It is’?” This was getting ridiculous. “What are you 
trying to say?” 


Ed Barton finally said “The equation you’ve written, I think it zs reversible.” 


I was actually pissed off. “Ed, try and get this straight. I’ve done nothing but 
write down an ad hoc version of the equation just to give you a starting point. It’s 
not the answer; it’s the question. Take the question and go find the answer.” I 
had not spent any time thinking about or looking for a reversible version of the 
Schrodinger Equation. I liked to think of problems that I might be able to solve 
and refrain from working on them so I could hand them off to students. That way 
they could have the chance to be the first to solve some real problem. 


I could not imagine what Barton was getting at or why, and this was violating 
my sense of how I wanted this meeting to end. Suffering from an extreme attack 
of mental set, I simply wanted Barton to get started on the right footing and then 
go off and get to work. Somehow I couldn’t hear what he was saying. Barton 
finally broke through to me “I think that the equation you wrote on the board is 
the equation you want me to find. I think that it already is exactly reversible.” 


This time I finally looked at the equation and realized that by simply transposing 
one term to the other side it was obviously exactly reversible. That this was true 
was so counter-intuitive to me that I was flabbergasted. You see, in Physics we 
are normally satisfied with the differential equation. We use the difference form in 
order to write computer programs which are inexact and almost never reversible. 
But once the difference equation was written in a symmetrical fashion, it allowed 
itself to be transformed into an exactly reversible algorithm by merely moving one 
term to the other side. I had written it down in a moment of inspiration that was 
driven mostly by esthetics, and had failed to look at and understand what I had 
written. Ed Barton’s immediate observation of the reversibility of that equation 
remains one of my most striking memories of 18 years on the MIT faculty. 


As soon as Barton left my office I did what I had done so often under similar 
circumstances, I wrote a program in Lisp to run the equation, saw that it was 
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reversible and nevertheless not pathological, then I picked up the phone and called 
Feynman. “Richard, we've just discovered the most amazing difference form of 
the Schrodinger Equation.” I then proceeded to tell Feynman what we found and 
how an esthetic principle helped us find it. He was fascinated but had no real 
comment. I also told him that in a simulation of what I called the “0-Dimensional 
Schrodinger Difference Equation” I had noticed that the amplitude (calculated in a 
simple way) did not drift off very far from its original value, which was surprising 
and serendipitous. For some reason I would always phone Feynman with my latest 
idea and he would always reply by writing me a letter. A week later I got a letter 
from Feynman with our new equation and a beautiful proof. Feynman showed 
that by calculating the amplitude for the 0-Dimensional case in a perfectly time 
symmetric fashion, keeping to my esthetic principle, then the values computed by 
the reversible difference equation exactly conserved the value of the amplitude. | 
was happy and chagrinned. By using one of my favorite tricks on my own equation, 
Feynman had still managed to one up me! 


Appendix 


The following equations and Mathematica functions illustrate the point of Feyn- 
man’s contribution with regard to the calculation of probabilities for the reversible 
difference form of the Schrodinger Equation. 


C(n)_ FP &C(a) 


au dt 2m ox 
The Schrodinger Equation for motion along a line in free space. 
OC(xr) _ 0-C(2) 
dt Ox? 


Both sides divided by 7; constants lumped into k. 
DiCz,t = ikD?2 Cz, 
The difference form. 
Czt+1 — Cot = tk(Cz-1,t — 2Cz,t + Cr41,t) 
The explicit form with the asymmetric first difference on the left. 
Crta1 = {tk(Cr-1,t — 2Cz,t + Cr4i,t)} + Crt 


The computational form from the above equation. Because of roundoff and trun- 
cation errors it is not exactly reversible. (The curly braces signify roundoff to a 
machine precision number.) 


Cot) Cait = ik(Cz—1,t — 2C2e + Co4i,t) 
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Crtti — Ce,t-1 = 2tk(Cr-1,t — 2Cz,¢ + Cr+ ,t) 
The explicit form with the symmetric first difference on the left. 
Coz t+1 = {2ik(Cz-1,¢ om 2Cz,t - Cz41,t)} a Czt-1 


The forward computational form of the Schrodinger Difference Equation. What is 
counter-intuitive but absolutely true is that it is exactly reversible when computed 
on any ordinary computer even though information is seemingly lost at every step 
due to roundoff and truncation error! 


Crt—1 = —{2tk(Cz-1,t — 2Cze + Cr4iyt)} + Coie4i 


Above, the reverse form of the Schrodinger Difference Equation. It exactly retraces 
the steps (in the opposite order) of the forward form. 


What follows is the derivation of what might be called the simplified, zero di- 
mensional Schrodinger Equation. We start by doubling the time steps from t (as 
used above) to 2t, so that we can do half time step calculations. We also assume 
that the complex amplitudes are divided in time so that all Real values occur at 
odd time steps 2¢+ 1, (Cz,2#41 is real) and all Imaginary values occur at even time 
steps of 2t, (Cz,2¢ is imaginary). 


Cz,2t42 = 2tkCz 21 + Cz,2x-2 | C complex - double step 
Rez 241 = 2tkIz 04 + Re 2t-1 The Real half step 
Tz ot+2 = 2k Re oe41 + Le, 2 The Imaginary half step 
Sot+1 = 2tkS2.4 + Sz t-1 Get rid of R and I labels 
P, ¢ = Soe + Se ei Simple minded probability 
Py = of — Sz t-15z2,t41 | Feynman’s probability 


What follows is a series of Mathematica functions that illustrate numerically 
these properties of reversibility and conservation of probability 


ClearA11[S]; S$[1]=1000; S[0]=0; k = 7/256; 
S(t_]:= S(t] = Round(2ikS[t-1]] + S[t-2] 


We define the Mathematica function for the zero dimensional Schrodinger Equation. 
While we compute with integers, the quantities we represent are actually real and 
complex numbers. The integer 1,000 in the computer represents the number 1.0 in 
the physical model. 


Table[S[n], {n,1,10}] 
{1000, 25 I, 999, 50 I, 998, 74 I, 996, 98 I, 994, 122 I} 
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The first 10 values computed in the forward direction. 


ClearAl1[(S]; Sr(10]=S[10]; Sr[9]=S[9]; k = 7/256; 
Sr(t_]:= Sr{t] = - Round[2ikSr[t+1]] + Sr[t+2] 


We define the Mathematica function for the zero dimensional Schrédinger Equation 
running in the reverse direction. 


Table[Sr{[n], {n,10,1,-1}] 
{122 I, 994, 98 I, 996, 74 I, 998, 50 I, 999, 25 I, 1000} 


We get the exact same values going in the reverse direction as in the forward direc- 
tion. 


Table[(Abs[S[i] + S[i+1]]*)/1000.0, {i, 1, 24, 2} ] 
{1000.62, 1000.5, 1001.48, 1001.62, 1002.92, 1003.4, 1003.7, 
1003.93, 1004.01, 1005.29, 1005.85, 1005.7} 


The conventional definition of the probabilities hovers near to the value 1,000 (which 
represents probabilities of 1.0). Since we multiply two scaled numbers in computing 
the probabilities, we have to also divide by the scale factor of 1,000. 


ClearAll[Sp]; Sp[1]=SetPrecision[1000.0,40] ; 
Sp[0]=0; k=7/256; Sp[t_] := Sp[t] = 2ikSp[t-1] + Sp[t-2]; 
Table[(Sp[i]? - Sp[i+1] Sp[i-1])/1000, {i, 1, 24, 2}] 
{1000 .000000000000000000000000000000000000, 
1000 . 000000000000000000000000000000000000, 
1000 .000000000000000000000000000000000000 , 
1000 . 000000000000000000000000000000000000, 
1000 .000000000000000000000000000000000000,, 
1000. 000000000000000000000000000000000000, 
1000 .000000000000000000000000000000000000, 
1000 . 000000000000000000000000000000000000, 
1000 .000000000000000000000000000000000000, 
1000 .000000000000000000000000000000000000, 
1000 .000000000000000000000000000000000000, 
1000. 000000000000000000000000000000000000} 


Feynman’s definition of the amplitudes would always be exactly one if calculated 
with infinite precision. (Remember, all numbers are scaled up by 1,000) 


{Sp[10], Sp(11], Sp[12]} 
{122.42295112400517290576729397847903102831, 
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Fig. 20.1. The plot of the amplitudes in the complex plane. 


990 .976801879307127359731098739346914585 , 
146 .74518112917657354065492213527643966741} 


(Sp(11]? - Sp(10] Sp[12])/1000 
1000. 000000000000000000000000000000000000 


The above sample of data further illustrates the Feynman definition of probability. 


ComplexToList[phi_] := 
{Re[phi], Im[{phi]}; 

ListPlot [Table [ComplexToList(S(i] + S{it+i]], 
{i, 1, 512, 2}], AspectRatio — Automatic] 


N.B. Such reversible computations may be thought of as the computation of 
any function of the present (with any roundoff and/or truncation error), then we 
either add or subtract the past (with no error) in order to compute the future. For 
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exact reversibility, it is necessary that the last step of the computation be effectively 
done in such as way that there is an exact inverse (i.e. integer addition is the exact 
inverse of integer subtraction). The exact reversibility is completely unaffected by 
whatever kinds of roundoff or truncation error occur, so long as that error is confined 
to computing the present. 
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ACTION, OR THE FUNGIBILITY OF 
COMPUTATION 


Tommaso Toffoli 


Abstract 


We informally explore an emergent interpretation of the action integral in physics, 
and discuss its connections with the concept of “amount of computation”. Much 
as entropy quantifies the lack of information one has about the state of a system, 
action seems to quantify the lack of information about the system’s law. 

From given causes, effects are generated by nature in 


the most efficient way. ... No natural action can be. 
abbreviated [Leonardo] 


An institution will grow until it uses up all available 
resources (Parkinson’s law] 


21.1 Introduction 


Physical entropy measures amount of information; here I'll argue that physical 
action measures amount of computation. My approach will be an impressionistic 
one: motivation, metaphors, circumstantial evidence, toy examples. This is a still 
quite unsettled area of research (Jaynes [1] is a particularly illuminating work in 
this regard); my paper is an invitation, not a review. 


We are taught to regard with awe the variational principles of mechanics [2, 3}. 
There is something miraculous about them, and something timeless too: the storms 
of relativity and quantum mechanics have come and gone, but Hamilton’s principle 
of least action still shines among our most precious jewels. 


But perhaps the reason that these principles have survived such physical up- 
heavals is that after all they are not strictly physical principles! To me, they appear 
to be the expression, in a physical context, of generalfacts about computation, much 
as the second law of thermodynamics is the expression, in the same context, of gen- 
eral facts about information (cf. [4]). More specifically, just as entropy measures, 
on a log scale, the number of possible microscopic states consistent with a given 
macroscopic description, so I argue that action measures, again on a log scale, the 
number of possible microscopic laws consistent with a given macroscopic behavior. 
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If entropy measures in how many different states you could be in detail and still be 
substantially the same, then action measures how many different recipes you could 
follow in detail and still behave substantially the same. 


Information can be quantified because it is fungible.1 In virtually any circum- 
stances, to the telephone user it does not make a difference whether a conversation 
travels by copper cable or by microwave link. What the user is charged for is not the 
message itself or the use of the medium per se, but merely the variety of alternative 
messages that could have been sent. The importance of ‘amount of information’ 
in physics is enhanced by the fact that at a microscopic level physics is invertible, 
and thus ‘amount of information’ is a conserved quantity; as a consequence, phase 
space must flow as an incompressible fluid (Liouville’s theorem), and this imposes 
a strict discipline on the possible dynamics. 


To what extent is computation fungible? Let us imagine two technicians doing 
numerical integration of a differential equation using Mathematica, one on a MAC 
and the other on a PC. A pedantic physicist who kept track of every nucleus, 
electron, and photon would observe totally different histories in the two computers. 
Even at the coarser level of machine instructions and memory words, what is going 
on in one computer is very different from the other. Yet one can meaningfully 
say that the two computers are running “the same program.” Of course, the two 
programs are quite different (try loading and running on the PC the executable 
file for the MAc!); what we mean is that the differences in the two programs are 
deliberately engineered to compensate for the differences in the two machines, so 
that they will yield the same overall behavior. 


Thus, while at a low level (say, response to a DMA interrupt) the two machines 
are not interchangeable, at a high enough aggregation level (say, doing numerical 
analysis) they are substantially equivalent. Not only can the two machines simulate 
one another (that’s a consequence of their computation universality, and compu- 
tation universality is cheap); on almost all macroscopic tasks they can do so with 
essentially the same timing—up to an overall proportionality factor k. At this 
level, computation is apparently fungzble. That is, the internal details can be made 
irrelevant by appropriate programming, and all that remains is a single scalar pa- 
rameter k—the machine’s computation capacity. If the MAC has capacity 1.5 times 
greater than the PC, then on most tasks a laboratory equipped with 30 PCs will be 
able to do in a month the same amount of computational progress as one equipped 
with 20 Macs. (Only rare, perversely chosen tasks will be able to show significant 
performance differences between the two laboratories.) In this scenario, computa- 
tion capacity becomes a tradable commodity that can be sold “by the pound” like 
potatoes, and thus is a fungible resource (cf. [5}). 


1 According to the American Heritage Dictionary, “Being of such a nature that one unit or 
part may be exchanged or substituted for another equivalent unit or part in the discharging of an 
obligation.” 
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Note that information capacity (e.g., channel capacity or storage capacity), 
which comes to mind as a typical example of fungible resource, only becomes so in 
an appropriate macroscopic scenario. To begin with, to take advantage of Shan- 
non’s theorems one must handle information in bulk, doing long-term buffering and 
averaging. Moreover, to arrive at a well-defined value for capacity one must ignore 
a number of ancillary costs (encoding and decoding resources, latency, etc.). 


Our questions, then, are: In what context and to what extent is the concept 
of ‘fungible computation’ viable? How can it be used? What is its relevance to 
physics? What are its connections, if any, to the concept of ‘action’ and the least- 
action principle? 


21.2 Computation capacity 


Computer science already has two quite distinct concepts that relate to “how hard 
it is to compute a given function f.” Computability {6] asks whether or not there 
is an algorithm that for any value x of the argument will produce the corresponding 
result y = f(x) in a finite number of steps; the issue is really whether the function 
can be computed at all. Computation complexity [7] asks how the number of 
steps (or other quantities such as the amount of intermediate storage) needed to 
produce y grows as a function of the size of x. To classify an algorithm, one compares 
the asymptotic rate of growth (as the size of x goes to infinity), or whatever bounds 
for it can be ascertained, with a number of reference rates (linear, polynomial 
exponential, etc.). 


We are interested in a different concept, namely, the “computational worth” of 
a computer; intuitively, how much one should charge for its rental in a competitive 
market in which one and the same computer may be put to different uses (which 
are not known in advance, at fabrication time or even at rental time) and different 
computers may well be put to the same use if expedient. We'd like this measure of 
computational worth to be defined in terms of function, not of structure (i.e., with- 
out reference to physical or technological parameters such as area, speed, number 
of gates, internal wiring); however, if it is to express the (presumed) fungibility of 
computing resources, it should be approximately additive when applied to physical 
computation in bulk amounts. 


For the time being, a computer will mean a finite combinational network in 
which a distinguished subset of input lines constitute the program p, the rest of the 
input lines constitute the argument z, and the output lines constitute the result 
y.” The behavior of the computer is the functional relationship between argument 


2 An ordinary computer operated for a fixed number n of clock cycles is a machine in this sense, 
as the sequential network that makes up the computer can be “unrolled in time” into an n-stage 
combinational network; the input is represented by the initial state of the computer’s memory, 
and the output by its final state; different values for n yield different “machines”. Real-time input 
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x and result y. 


We'll provisionally define the computation capacity of a computer, viewed 
as a programmable machine, as the log of the number of distinct behaviors the 
machine can achieve over the range of possible settings or programs. While different 
behaviors must come from different programs, two or more programs may yield the 
same behavior. We are interested in measuring the variety of actual behaviors, as 
constrasted to the variety of settings. For example, a “ten-speed” bicycle has two 
levers, one with two settings (A and B) and one with five, giving ten settings overall. 
However, the A and B gear-ratio ranges overlap, as indicated in Fig. 21.1, giving a 
smaller number of effectively distinct “speeds”. 


10 8 6 4 #5 
A 20/2 25 33 4 £5 
B 24|;24 3 4 48 6 
® 


2 3 4 


Fig. 21.1. In a “ten-speed” bicycle, the two five-speed ranges substantially overlap, giving 
fewer than ten effectively distinct gear ratios. Here there are only nine distinct ratios, and, 
on a coarser grain, only six or seven practically different “speeds”. 


Much as for the case of channel capacity, this basic definition can be refined and 
extended in several ways. First, instead of giving equal weights to all behaviors, one 
may wish to take into account with what probability each behavior occurs, which in 
turn reflects the probabilities of the programs that gave rise to it. Second, one may 
wish to consider a statistical rather than deterministic dependence of behavior on 
program (“noisy computer”), and count as computation capacity only that fraction 
of the variety of behavior that can be attributed to deliberate programming (rather 
than to chance). This approach may be further extended by including the peculiar 
statistics of quantum-mechanical behavior as part of the specification of the com- 
puter. Third, one may wish to consider summing the number of behaviors over 
all possible ways of partitioning the input lines into program lines and argument 
lines, and all possible ways to partition the output lines into result lines and lines 
to be ignored; this is equivalent to identifying the range of behavior with the set of 
all possible subfunctions that the network can realize. In this way, the distinction 
between argument and program disappears, and computation capacity becomes as- 
sociated simply with the overall input/output relationship of the computer. Finally, 
one may define all sorts of conditional capacities by imposing constraints on inputs, 
outputs, or their relationship; for instance, demand that the argument, seen as a 
bit string, consist of twice as many zeros as ones. 


and output, as through a console or a serial port, are represented by lines entering or leaving the 
network at each of the n stages. 
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We shall illustrate the basic definition of computation capacity with an example, 
namely the “canonical” computer consisting of an ordinary PROM (Programmable 
Read-Only Memory). This is a network consisting of a single node with m input 
lines collectively called the address, n output lines called the data, and k more input 
lines (where k = n2™) called the program. 


re oe 
—> nm —_ 
ee eS 
nt, BLCrn 


Fig. 21.2. The PROM as a canonical computer, with m lines of address z, n lines of data 
y, and n2™ bits of program p. 


In this canonical machine, each program gives rise to a distinct function y = p(z). 
The number of functions is thus 2”, and its log (in base 2) is simply n2™; that 
is, the computation capacity equals the number of PROM program bits. 


Note that in a real PROM chip the programmable bits take up most of the chip’s 
real estate (the address decoding tree occupies a much smaller area and consists of a 
fixed—nonprogrammable— gates). Thus, our measure of computation capacity (log 
of number of functions) is consistent with the market reality, where PROM chips of 
different organization (for example, 32- vs 8-bit wide) but with the same number 
of programmable bits have essentially the same size, cost essentially the same, and 
have a comparable market share. 


If a comprehensive theory of computation capacity along the present lines can 
be developed at all, it is likely to be much more complex than Shannon’s theory of 
communication (or channel) capacity. In the latter, a channel of given capacity S 
can be shared between two users who will enjoy capacities S; and S» respectively. 
With appropriate encoding and decoding, S; + S» can be made as close as desired 
to S; however, the computational resources needed for encoding and decoding will 
in general rise steeply as S; + Sp approaches S. Thus, communication capacity is 
additive only when the cost of computing resources does not enter the picture. If we 
apply a similar construction to the sharing of a computer rather than of a channel, 
the resources used for encoding and decoding will be of the same kind as those that 
we are trying to share, and cannot be ignored. Thus, we will be confronted with a 
nonlznear theory. We may only hope to get an approximately linear theory when 
the amount of computation done between the encoding and decoding stages is so 
large that the pro-rated cost of the latter becomes negligible. In this sense, the 
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fungibility of computation capacity cannot be expected to extend down to the scale 
of a single gate—much as the price of a pint-bottle of water has little relation to 
the cost of water in bulk. 


For the same reasons, if we try to apply a theory of computation capacity to a 
distributed computing medium—a network extending over space and time—little 
insight can be expected from the theory unless the network’s interconnection is local 
and its texture remains uniform over a scale of many node spacings. A local and 
uniform combinational network extending over one one or more spatial dimensions— 
and, of course, one time dimension—is a cellular automaton (Fig. 21.3). 


Fig. 21.3. Spacetime representation of a simple one-dimensional cellular automaton. Each 
logic gate f of the combinational network represents the activity of one cellular automaton 
“cell” through one clock cycle. 


In what follows, the computing networks we'll be interested in will be dis- 
crete, fine-grained models of continuous dynamical systems: continuous behavior 
will emerge from the underlying discrete activity by coarse-grained averaging. Our 
goal will be to establish a bridge between the variational principles that appear 
to govern the macroscopic behavior of a system and computation-capacity argu- 
ments (and thus, ultimately, merely counting arguments) applied to its microscopic 
description. 


Our definition of ‘computation capacity’ rates the power of a computer in terms 
of how many different tasks it can be programmed to do in a fixed amount of time. 
What does that have to do with the difficulty of the tasks themselves? Why should 
one pay much for a computer that is able to do zillions of things, if all of them turn 
out to be trivially simple? 


The answer lies in the pigeon-hole principle. The number n(s) of possible tasks 
having a certain level s of complexity® increases very rapidly with the complexity 


3By ‘complexity of a computational task’ we may take the minimum number of operations, 
on a reference computer architecture, required to complete the task, just as the ‘complexity of a 
piece of information’ may be taken as the minimum number of statements, in a reference computer 
language, required to print out that information. 
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itself. Thus, if we are told that a computer can do a very large number n of different 
things but we aren’t told which things, we nonetheless know that there just aren’t 
enough simple things to exhaust its range of behavior, and we conclude that this 
range must include some tasks above the complexity level s(n) (where s(n) is the 
inverse function of n(s)). 


More concretely, suppose we want to choose a rental computer for doing a task 
of complexity s, and all that the rental catalog says about each computer they offer, 
besides its rental cost, is the number n of different things it can do. If for the sake 
of economy we choose a computer with i < n(s), we can be almost sure that our 
task will not be among those the computer can complete in the given time. On the 
other hand, if we pick 7 >> n(s), then there are good chances that our task will be 
in the computer’s repertoire. If we do not want to waste our money, we must be 
ready to pay enough to get us a computer with a rating at least n(s). 


Finally, what about fungibility? If computers are priced proportionally to log n, 
and by putting together two computers rated n; and nz I effectively get a rating 
7, 2, then for a price log n, + log nz I will be able to do typical tasks of complexity 
~ s(n,n2), just as if I rented, for the same price, a single computer of rating n,n. 
Alternatively, to get a rating n,n» from a low-rating computer I can rent it for a 
time proportional to logn; + lognz. In other words, in an ideal market it should 
be possible to achieve the capacity of a supercomputer—at a comparable cost—by 
putting together a network of cheap (and weak) computers or by running a small 
computer for a sufficiently long time. 


21.3 Density of computation capacity 


While mass is well-defined even for a microscopic sample of material, intensive 
properties such as density or conductivity are poorly defined and vary with the size 
and shape of the sample. It is only for macroscopic samples of uniform constitution, 
in which finite-size effects become negligible, that these quantities converge to a 
value characteristic of the material and independent of sample size and shape. In 
a similar way, well-characterized intensive properties may emerge when one deals 
with a large computational network of uniform texture. What we are interested in 
is fine-grained computing networks for which the concept of density of computation 
capacity is meaningful. Before investing too much in precisely defining this property, 
it is important to have an idea of how one would recognize it and make use of it. 


Consider an indefinitely extended network such as that of Fig. 21.3, and cut out a 
swatch of it of width x and depth ¢, gluing together the two spatial ends of the strip; 
this will constitute a computer according to our definition (§21.2), and will have a 
certain computation capacity A(z, t). Since information only travels at the speed 
of one site per step, when zx > ¢ the swatch is effectively decoupled into a number 
of almost independent spatial regions. The number of functions computed by the 


356 TOMMASO TOFFOLI 


swatch is thus approximately the product of the number of functions computed by 
the individual regions, and the overall capacity (log of number of functions) will be 
the sum of the separate capacities. For fixed t = to and large enough g, the capacity 
of the swatch will grow linearly with z. 


Let us fix an x = Zp large enough to be in this linear region (the choice of x9 will 
depend on to, as linearity will set in later when fp is larger) and let us start moving 
t upwards from the reference value tg. Typically, the computation capacity of the 
swatch will tend to increase with t, as formerly uncoupled regions of the swatch 
now have the time to become coupled. However, as ¢ grows to be much larger than 
Zo, the “computational trajectories” that the data describe as they evolve through 
time will tend to cycle (all trajectories will of course have entered a cycle by time 
= 27°), and eventually the computation capacity of the swatch will flatten out. 


We are interested in the initial relative slope, 


1 dA 
Or on a 


z=zo(to) 


If, for a sufficiently large depth to of the swatch, the quantity C is independent of to 
itself, then this quantity is an intensive property of the network’s texture (i.e., of the 
cellular automaton’s local structure), and constitutes the density of computation 
capacity of the cellular automaton; the total computation capacity of a swatch of 
depth t and width z > t will then be simply Czt, and the cellular automaton can 
be rented at a flat rate C per unit of spacetime volume. 


21.4 Specific ergodicity 


The present session is a digression. There still is precious little theoretical un- 
derstanding of how likely a network texture is to have a well-defined density of 
computation capacity. Here we give concrete experimental evidence for the exis- 
tence of an intensive network quantity, namely, specific ergodicity, which, though 
distinct from density of computation capacity, is intimately related to it. 


Consider a discrete dynamical system (for instance, a finite-state automaton) 
whose state set consists of L points. The entropy of the state set is 


total = log L, 


in the sense that if you choose a point at random it will take me on average log, L 
binary questions to correctly guess your point. Consider now an arbitrary invertible 
dynamics (a permutation) 7 on this set. Such a dynamics partitions the set into a 
collection of orbits, and for any point one can speak of the length @ of the orbit to 
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which it belongs. The entropy of (the partition induced by) r is 


£ id 
Aorvit = — > — log ZL 
all orbits 
this is the average number of questions needed to guess just which orbit the points 
belong to. On ther other hand, if I were told right away on which orbit the chosen 
point lies, I would have to ask on average only a number of questions equal to 


log £ 
phase = S| — 


all points 


in order to locate the point. Of course, Htotai = Horvit + Aphase- 


The specific ergodicity 7 of the dynamics rf is defined as 


a HAphase _ (log ¢) 
" aS 


= ; 21.1 
Jal total log L ( 


That is, 7 tells us what fraction of the total uncertainty is uncertainty as to the 
phase—or position on the orbit. Clearly, 0 < 7 < 1. When 7 = 1, all points lie 
on a single orbit, as in a car odometer; the system is ergodic. When 7n = 0, each 
orbit consist of a single point, and the system is useless as a counter or a clock. 
Intuitively, 7 measures how efficiently the system uses its state variables to “count”. 


We would like to extend the above considerations to infinite systems such as 
cellular automata. In this case, though definition (21.1) breaks down (since HAotal = 
oo and in general also Hphase = 00), it may be replaced by a definition entailing a 
limit. 

Let r be an invertible cellular automaton. In the one-dimensional case, one can 
imagine “wrapping around” the system so that site 7 is identified with site i + N 
(periodic boundary conditions), yielding a finite system ry distributed over N sites 


and consisting of L = N® state points (where Q is the number of states available 
to each site). 


Let 7n be the ergodicity of ry. The specific ergodicity of r itself is the limit, if 
it exists, 


y= vim, UN. (21.2) 
Except for a few trivial cases (identity transformation, Bernoulli shifts, etc.) it isn’t 
at all obvious that limit (21.2) should exist. Moreover, if the number of dimensions 
is greater than one, there are in general different wraparound aspect ratios (e.g., 
“squarish” vs “long and thin”) that yield the same number N of sites, so that a 
limit, if it exists, may depend on the aspect ratio itself. 
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We have numerically evaluated the first few terms of the sequence (21.2) for a 
number of simple cellular automata in one and two dimensions; this was done by 
randomly sampling the state space and explicitly following the orbit of each sample 
point until it closed on itself. In all cases, numerical convergence was rapid enough 
to suggest the existence of a definite limit;* moreover, in the two-dimensional cases, 
the dependence on the aspect ratio was negligible. We’ll present here the evidence 
for a few representative cases, and conclude with a brief discussion. 


All the cellular automata considered here are lattice gases, where 0 and 1 rep- 
resent, respectively, the absence or the presence of a particle on a given spacetime 
track. The one-dimensional ones use the lattice of Fig. 21.4a; the two-dimensional 
ones, that of Fig. 21.4b. 


(a) 


Fig. 21.4. Lattices used for the experiments, (a) in one dimension and (b) in two dimen- 
sions. 


Fig. 21.5 shows a case (FREDS*) where the specific ergodicity appears to be close 
to zero (and indeed can be proved to be zero). The same figure shows two cases 
where 7 goes up rapidly® and appears to have an asymptotic value close to unity. 


A more interesting case is illustrated by the rule ROTLR of Fig. 21.6, where 7 
seems likely to settle on a value (= 1/3) noticeably different from 0 or 1. In this 
rule, the three inputs are fed straight through or circularly permuted depending on 
their overall parity. 


Fig. 21.7 illustrates the results for two two-dimensional cellular automata dis- 
cussed in [9]. BBM implements the billiard-ball model of computation [8], and is 


“The experiments where performed using a personal computer running for several days. In 
view of the exponential complexity of the problem, even moderate improvements in the numerical 
estimates would require a drastic increase in computing power. 

°The collision rule embodies the three-input/three-output Fredkin gate [8], with the control 
line running along the middle track (“rest particle”) of each node. This rule is particle-conserving. 

SFREDR is a variation of FREDS: after performing the Fredkin-gate operation, the three outputs 
are circularly permuted; RANDC is an invertible rule that was chosen at random, and is not particle- 
conserving. 
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Fig. 21.5. Cases in which 7 appears to be close to 0 (FREDS) or to 1 (RANDC and FREDR). 


I ROTLR 


0 5 10 15 20 


Fig. 21.6. In the ROTLR rule, 7n seems likely to settle on a value close to 1/3. 


particle-conserving; TRON implements a synchronization scheme for asynchronous 
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computation: the four input signals are complemented if and only if they are all the 
same. In both cases, note how little 7; depends on the wraparound aspect ratio. 


3fn 
? BBM 
ane aay eee aa 
9 13 3x4 4x4 HC Ax 


0 10 20 30 40 50 60 


Fig. 21.7. Two-dimensional cases: the BBM rule and the TRON rule. 


To the extent that the limits suggested by the above evidence do indeed exist’ 
we are witnessing the emergence, at a macroscopic level, of an intensive quantity 7 
(measured in “entropy per unit area”), associated with a cellular-automaton rule 7, 
out ofthe extensive quantity Hphase (a quantity having the dimension of entropy and 
associated with ty treated as a lumped system). In other words, even though the 
individual terms ny are measured on lumped “objects” of definite size and shape, 
the specific ergodicity 7 emerges as a computational property of the “material” that 
makes up a cellular automaton. As far as that property is concerned, the cellular 
automaton can be sold “by the yard”. For instance, if one wants to synthesize an 
n-bit counter within the cellular automaton, one will have to purchase at least n/n 
bits of cellular automaton “stuff”. 


7See [10] for a little more discussion; the theoretical questions here are similar to those encoun- 
tered when studying the topological entropy of a dynamics [11]. 


THE FUNGIBILITY OF COMPUTATION 361 


21.5 A Kafkian scenario 


We have seen in §21.3 how it may be possible to associate to a computing network 
a quantity C representing its density of computation capacity. Suppose the texture 
of the network, and thus its macroscopic local properties, vary smoothly over space 
and time. Then we can view C as a function of x and ¢t. What aspect of network 
behavior will the knowledge of C(z, t) allow us to predict? 


Let us introduce a somewhat Kafkian scenario [12]. Imagine a large bureaucracy 
(say, an insurance claims department) housed in an office complex. Each room 
contains a desk, a filing cabinet, a typewriter, an IN/OUT tray, and a staff person. 
Claims are submitted at an input window. A claim folder will move from room 
to room through different processing stages, growing and shrinking as evidence is 
accumulated and digested. Eventually, the claim response will be available at an 
output window. Through directives, office traditions, and instructions printed on 
the back of the forms, a folder more or less “knows” what step is due next at any 
particular stage of processing, and will “seek” appropriate available resources. If a 
clerk is on vacation his duties may be taken over by an adjacent office, or folders 
may pile up in his IN tray. Given the large amount of similar work, much parallel 
processing is present; for example, several rooms may be devoted to time-stamping 
incoming correspondence. Though at any moment different rooms may be assigned 
to different tasks, with a little remodeling and retraining both rooms and people 
are basically interchangeable. As a folder moves through the bureaucracy, its rate 
of progress dr/dt (where 7 denotes the processing stage the folder has reached, or 
the case’s “proper time”) depends on the local availability of resources. Reassigning 
a case to a new Office if the primary venue is too busy will require work and time 
and will slow down the case’s progress. 


The spacetime path of a folder through the bureaucracy will be a goal-oriented 
random walk. Basically, though the local details are somewhat impredicatable, the 
folder will tend to progress toward its destination. Similar claim folders may take 
different times to reach their completion, depending on the congestion of the offices 
they go through and the accidents they encounter on their path. Suppose, now, 
that for a productivity study we take a collection of similar cases, submit them at 
the same time to, and check which ones will appear completed at the output window 
at precisely time t;. (Some cases will have been completed earlier than tg and some 
will be completed later; these are ignored.) For each case we trace its spacetime 
trajectory, obtaining a bundle of trajectories as in Fig. 21.8a. 


We repeat the experiment the following year (Fig. 21.8b) in similar circum- 
stances, but this time there has been a flu epidemics (indicated by the shaded area) 
affecting a number of contiguous rooms for a few days. Now, we shall state the 
basic Bureaucracy equivalence principle: 


As far as typical office work is concerned, a section with a flu epidemics 
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(or pregnant staff members, or preparing for a soccer match, or chroni- 
cally overloaded, and so forth) is behaviorally equivalent to an unaffected 
section, except for an appropriate slow-down factor. (In other words, 
from a macroscopic viewpoint, the precise details of the disturbance 
wash out, and all that remains is not a different kind of computational 
resources, but just a lesser amount.) 


If we look at the distribution of trajectories that happened to process a case from 
beginning to end in the interval (to, t; ) we'll find that, in the flu scenario, fewer made 
it through in that time. Of those which did, most tended to follow a somewhat 
curved path like in Fig. 21.8b. The reasons for this “deflecting force” are clear. 
Folders whose zig-zag path passed through the flu epicenter got delayed; only a few 
rare ones that managed to get stamped between two sneezes made it through by 
t,;. Folders that strayed far away from the epicenter had to travel a much longer 
distance, and only few were able to make it through by t,;. There remain those 
which barely kept clear of the flu area, wasting a little time by this detour but 
traversing full-efficiency offices all the time. 


(ouT, ti) OUT, t1) 


(IN, to) 


(a) (b) 


Fig. 21.8. Drift of claim folders through a claims department. Only claims filed at time 
to and completed at time ¢; are sampled here. In (b), the shaded area had a flu epidemics 
and had a lower work performance; cases that happened to skirt the affected area had 
relatively better chances to make it by time f}. 


(IN, to) 


We can even propose a phenomenological law that “the flu center exerts a grav- 
itational force on the distribution of folder trajectories” and study the dependence 
of this force on distance. Eventually, we’ll be able to explain this phenomenologi- 
cal law through statistical arguments, and show that it is but the expression of a 
variational principle. Let us observe that, for any specific folder path, the amount 
of effective evolution of the folder from submission to response, which by definition 
is a constant K (a case will not come out to the output window until its processing 
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is completed) cannot be more than the total amount P of processing it could have 
received along that path, that is, 


ti 
P= / (C(x, t) — W(z))dt > K, (21.3) 


to 


where C(x, t) denotes the density of computation capacity available in office area x 
on day t, and W(az) denotes the computation capacity wasted in moving a folder 
from office to office at a speed & (cf. [13]). If we consider all possible paths satisfying 
inequality (21.3) as equally likely, we’ll have a distribution of paths with P bounded 
above and below, 

maxP>P> kK, 


where the max is over all paths from (20, to) to (21, #,). If the peak of this distribu- 
tion is in the interior of the interval given by the above bounds, P will be stationary 
at the peak. If the peak is sharp, then Pyeak © Pmean, and the path for which P is 
stationary will be a good representative of the entire distribution. 


By the same token, if we assume that our universe is run as a “numerical simu- 
lation” on a fine-grained parallel processor with fixed texture and a finite density of 
computation capacity Co, then we have a qualitative understanding of why a planet 
known to start at (zo, to) and to arrive (21, ¢,) will go on a curved orbit around the 
sun instead of making a straght line. Near the sun, the underlying computational 
medium is busy simulating the sun’s gravitational potential ¢, and the computa- 
tional capacity that remains available for simulating a planet is reduced from the 
base value Co by a factor e~%. For the planet to go too far from a straight path in 
an attempt to avoid the vicinity of the sun will divert good computational resources 
into data movement, leading to a point of diminishing returns. Overall, the simu- 
lation will take a planet on the path that will provide it with the largest effective 
amount of “computational services”. 


Note the continual switching of meaning between ‘path’ intended as a micro- 
scopic history, and ‘path’ intended as the mean or the mode of a distribution of 
microscopic histories. To go back to Lamarck and Darwin, we do the same switch- 
ing when we say that “as the giraffe strove to reach higher leaves, its neck got 
longer.” The giraffes, (individuals) were indeed striving to reach higher leaves; 
some, especially those that had a slightly longer neck, managed more often, fed 
better, and preferentially transmitted their longer necks to their progeny. The neck 
length of the giraffep (gene pool distribution) indeed got longer. Though the phrase 
in quotes, which mixes giraffe, and giraffes, is strictly speaking a non sequitur, it is 
a convenient abridgement of a valid combinatorial tautology. 

Coming back to our story, if “large amount of computation capacity” is defined 


as “large number of different things (such as individual paths) that can be done,” 
then it is a matter of tautology that a path ensemble will “strive to go” where there 
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is the most “free computation capacity.” This is, of course, exactly the same kind of 
tautology that we use when we say that “heat flows from T, to JT, when T, > Tp.” 


21.6 Along the infinite corridor 


The “bureaucracy equivalence principle” introduced in the previous section qualita- 
tively corresponds to the equivalence principle of general relativity. We'd like now 
to investigate in a more quantitative way the amount of computational resources 
diverted for moving data around (cf. the W term in (21.3)); as we shall see, this 
quite closely corresponds to the Lorentz invariance of special relativity. 


We'll introduce another scenario. Along one of MIT’s “infinite corridors” [14] 
there lies a one-dimensional cellular automaton: basically, an indefinitely extended 
row of identical small computers, or “cells,” which communicate only with their 
immediate neighbors. Each unit has a left and a right input and a left and a right 
output; at every clock pulse a new value for the outputs is computed as a function 
@ of the current value of the inputs; this function is the same for all cells. Viewed 
in spacetime, the cellular automaton is but an indefinitely iterated combinational 
network, as in Fig. 21.3. The activity of a cell through a clock cycle is represented 
by a logic gate ¢, which can be thought of as spacetime event. Two signals flow 
into an event, interact through ¢, and two new signals flow out. To simplify the 
story, we shall assume that signals travel at lightspeed (c = 1) between events, and 
that they flow through an event in a negligible time. 


Our scenario involves Jane—a math professor—and Bob—a computer science 
student. Jane needs some computations done using the cellular automaton, and asks 
Bob to write a program for this task. The cellular automaton lies along the corridor 
on which Jane’s office is located. Whenever Jane wants to run the program p on a 
given argument a she injects the data string pa (the concatenation of p and a) into 
that portion of cellular automaton that lies just outside her door (Fig. 21.9). From 
there, computational activity spreads right and left through the cellular automaton 
as time advances. When the result b = F(a) is ready it will be displayed in the 
same place (the result will be recognized as such by a distinguished pattern, such 
as a “result ready” flag). We assume that the sizes of program and argument are 
negligible with respect to the extent of the computation both in space and time; 
it is thus immaterial whether data injection is serial or parallel, and we shall treat 
input injection as a localized event having no spacetime extension. Likewise for 
output extraction (Fig. 21.9). Clearly, only the portion of the network that is in the 
absolute past (backward light cone) of the result can affect the latter. Likewise, if 
the cellular automaton starts in a blank state (except for program and argument), 
or the program is able to ignore any preexisting data contained in it, then only the 
portion of the network that is in the absolute future (forward light cone) of the 
argument can be relevant to the computation. This is suggested by the diamond 
shape in Fig. 21.9. 
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Jane waits outside her door for the result. As soon as this appears, she looks 
at the campus clock on the corridor’s wall and records the time ¢ elapsed since the 
beginning of the computation. 


computation 


program+argument Keraor 


Jane’s door 


Fig. 21.9. A computation consists of injecting program and argument into the cellular 
automaton, and extracting the result at a later time. If input and output strings are short 
with respect to the spacetime extent of the computation (indicated by the diamond), input 
and output can be treated as localized, pointlike events. 


Jane is demanding; she keeps asking Bob to improve the running time of the 
program. Finally, Bob produces a provenly optimal program, guaranteed to give 
the result in the shortest possible time, fo. 


Now, when she’s thinking, Jane likes to pace along the corridor at a speed £; 
after injecting the argument a she'd like to go on pacing rather than having to wait 
in place. Thus, she asks Bob to revise the program so that the output will appear 
wherever she is at that moment, i.e., on the worldline x = ft rather than on the 
line x = 0. 


What Bob does, as a first try, is add to the program a little “tag” routine: the 
result 6 is produced as before on the line x = 0 by program p, but as soon as 
this result is ready the routine takes over and makes the result shift at light speed 
toward Jane until it reaches her, as illustrated in Fig. 21.10. 


Of course, in this way the result does not reach Jane as soon as it is ready, but 
a a little later, namely, at a time 


tag = ge A EN 


Jane is not amused: “I want you to write a new program, one that aims the result 
directly along my worldline,” she says, “so that I will see it as soon as it is available.” 
Bob obliges, and produces a program pg that has the same input/output behavior 
as p, except that the result appears on the z = ft line. “Is this the best you can 
do?” asks Jane. “The time tgop, at which the result appears now, though shorter 
than ttag, is still longer than to.” Bob, however, claims that also the new program 
is optimal; the slowdown seems to be a necessary consequence of the constraint 7 = 
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Fig. 21.10. Three ways to have the result appear on Jane’s worldline z = (tt rather than 
on the line z = 0. (1) Add to the program a “tag” so that as soon as the result is obtained 
(at time to) the computation switches to a “shift right” mode which moves the result 
rightwards at light speed (dotted line) until it encounters Jane’s line at a time fteg. (2) 
Modify the program, possibly in a nontrivial way, so that the result will appear on Jane’s 
worldline at the earliest possible time, tog. (3) Leave the program unchanged, but slide 
the whole hardware at a speed {t; according to special relativity, the result will appear at 
time tharad = to/V1 — B. 


Bt. “Nonsense,” says Jane, “there must be something wrong with your program!” 
Frustrated, Bob hits upon a brilliant idea. He goes back to the original program 
p, but, as Jane walks along the corridor at speed G, he pushes the whole cellular 
automaton along at the same speed: now the result will certainly be at the right 
place at the right time. 


Surprisingly, things do not work quite that way! The result takes a time thara 
that is perhaps a bit shorter than t,o, but still longer than to (Fig. 21.10). Bob 
consults with Karl, a physics students. “Of course,” says Karl, “you'll get a longer 
time; this is an elementary case of relativistic slowdown. In fact, your thara must 
equal to /./1 — 62.” Measurements are made, and that is indeed the case. 


To sum up, we have a program p for the given cellular automaton, which delivers 
the result b = F(a) at the same place where the argument was injected (x = 0) at 
some time fo. If we want the same result delivered along Jane’s worldline z = Gt 
we can either modify the software, that is, run a new program pg on the stationary 
cellular automaton, or run the old program p on a different hardware, that is, on 
the cellular automaton pushed at speed 3. In both cases the result is obtained at 
a somewhat later time, i.e., at times tsor, and tnara respectively. Though for small 
3 the slowdown in the two cases is comparable, the reasons for it appear to be 
very different. In the hardware solution, we run the same program on a physically 
modified hardware, and as a consequence we have a physical effect, accounted for 
(though not really explained) by special relativity. In the software solution, in order 
to produce a moving rather than a stationary result we run a different program on 
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the same hardware, and this may well entail a different computational budget. 


Are the two reasons really different? In other words, couldn’t relativity itself be, 
at bottom, a way to describe how to reprogram the same underlying microscopically- 
grained hardware for different effects? In this case we wouldn’t be surprised to arrive 
at similar resource tradeoffs in the two situations. Lets’ take a closer look at the 


picture. 


0 


Fig. 21.11. The locus of the point Por, as @ ranges from 0 to 1, must lie in the region 
bounded by the light cone with vertex at the origin ({ = x), the constant line ¢ = to, and 
the locus of Pyag (or the light cone with vertex at Po). As we shall see, for almost all 
programs it will lie above the locus of Phara (the relativistic hyperbola with apex at Po), 
and will tend to be close to it. 


With reference to Fig. 21.11, from relativity we know what must be the curve 
described by the point Pharq as 2 goes from 0 to 1, namely, the hyperbola t? — zx? = 
t2. Let us derive some bounds for the curve described by Pyort. (a) The point Prost 
must lie inside the light cone (i.e., above the lightlike line z = t), because that is 
where the causal consequences of program and argument (injected at the origin) are 
felt. (b) It can’t lie below the line t = to, described by Pyag, because Bob proved 
that to is optimal in the absence of constraints. (c) Finally, it can’t lie above the 
line t = to + Gt, because the original program followed by the tag routine supplies 
a solution with that value of t. The border of the bounded area is indicated by a 
thick solid line in the figure; note that also the relativistic hyperbola fits well within 
these bounds. 

Let us look first at the Fyarq case, in which the entire cellular automaton is 
pushed at a speed (@. In this case the metric of the spacetime lattice of Fig. 21.3 
is distorted, as shown in Fig. 21.12, but its topological structure (which nodes 
communicate with which) is not altered. Thus, the causal chaining of events is 
invariant: for the same program and argument, homologous arcs will carry identical 
signals, and of course the result will be the same in the two cases. 


Moreover, even though position and timing of events will be different, the hard- 
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a 


Fig. 21.12. When the cellular automaton is pushed at a speed G (indicated by the dashed 
line in the spacetime diagram), the events corresponding to the gates of the combinational 
network of Fig. 21.3 are displaced in space and time, but the causal chaining of events is 
unaltered. 


ware solution of Fig. 21.13b yields the same density of gates in spacetime (the gates 
are just less crowded in the x = ¢ direction and more crowded, by the same factor, in 
the + = —¢ direction, as shown in Fig. 21.13), as well as the same spacetime volume 
(both quantities are relativistic invariants)—and thus the same overall number of 
gates as the static (Po) case of Fig. 21.13a. In terms of fungibility of computational 
resources, if we were leasing the cellular automaton by the hour we couldn’t care 
less if Jane put it on wheels. If, when walking, Jane looked at her own watch instead 
of the corridor clocks, she would see the result 6 = F(a) come in a time fo, as in the 
static case. It’s only when Jane tries to make the machine do double duty—as a 
telegraph line between different corridor locations as well as a computer—that she 
loses on one count what she gains on the other. 


Let us now look at the Pyort case (Fig. 21.13c), in which we run our computation 
on the same network texture as in the static case, but running a different program 
and using a different swatch of spacetime, indicated by the dashed outline. We have 
also indicated, within the dashed area, a solid area of the same size and shape as 
that used in the Fara case. This spacetime volume contains the same number of 
gates as in the Py and Phara cases, but they are arranged in a less favorable aspect 
ratio (more depth and less width, corresponding to more emphasis on serial than 
parallel processing). In this degraded computational context, it is not surprising 
that a little more spacetime volume may be needed to complete the task, stretching 
the result as late as Pyoft. This argument proves® that 


Peoft > Phard- (21.4) 


Can equality ever be attained in (21.4)? This is another interesting story. Imag- 
ine that what travels on the lines of the network (cf. Fig. 21.3) is tokens, the 0 state 
corresponding to the absence of a token and the other n — 1 signal states corre- 
sponding to tokens of different kinds. A spacetime patch with all Os in it will then 


®This is, of course, an ‘almost always’ result. In vanishingly rare computational tasks the new 
form factor may be an asset rather that a liability. 
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correspond to empty space. Jane’s requirement that Bob’s program be optimal en- 
tails, of course, that the computation space will be far from empty (using a density 
of Os greater than 1/n will waste information capacity). On the other hand, if Jane 
insisted instead on a simple, orthogonal, well-structured program, one that doesn’t 
take advantage of any ad hoc tricks just because they are available (cf. [15]), then 
the program will most likely leave plenty of empty space. In such a low-density limit 
it can be proved [10] that one can always go from a program p to an equivalent pro- 
gram pg (cf. Fig. 21.10) by means of a straightforward transliteration process, and 
that the resulting program will make P;og, coincide with Phara (equality in (21.4)); 
moreover, the transliteration process itself consists in just subjecting some of the 
program variables to a Lorentz transformation by a velocity §. In the low-density 
limit, therefore, the hard and the soft approach are completely equivalent, and com- 
putation obeys the same rules of special relativity as physics does. If the density of 
tokens (“energy density” in the network) is not close to zero, the behavior of this 
computational model begins to deviate from special relativity—but doesn’t physics 
do that too? 


to Po Paara 


(a) (b) 


Fig. 21.13. Spacetime volume and computational network texture used in the original 
computation (a), the “hardware fix” (b), and the “software fix” (c). In the limit of low 
density of utilization of computational resources, the Phara and Pzog Solutions can be made 
to coincide. 


21.7 Action and Lagrangian 


There are many leads that point to action as the right kind of physical dimension 
to measure “computation capacity”. 


Suppose somebody is in the business of leasing suitable pieces of spacetime for 
computational use. The “tenant” shall rent an “apartment” for a time 7, and fill 
the leased volume V with physical machinery in any way he pleases, trying to-get 
the most computation out of his real estate. Consider, for example, a “billiard- 
ball” model of computation [8] where logic signals (cf. Fig. 21.3) are represented by 
particles in free motion and logic interactions are realized by collisions between par- 
ticles. It is reasonable to use the number of collisions as an indicator of the amount 
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of computation performed. If a particle hits a boundary of the leased volume, it is 
the duty of the landlord to send it back in, since otherwise this particle will intrude 
into somebody else’s real estate and disrupt their computation (“landlord shall as- 
sure privacy!”). The lease contract will specify not only V and T’, but also what 
pressure P the walls must withstand: an apartment with harder walls will rent for 
more. Simple additivity arguments show that an open market rental fee must be of 
the form PVT’, which has the dimensions of action. What the tenant actually pays 
for is a number 
N=PVT/h 


(Ah is Planck’s constant) of “unit action cells”. How many collisions can he achieve? 


Suppose the tenant fills his real estate with n particles of radius r and mass m, 
traveling at speed v. The pressure will be 


P=nmv’/V, 
and the number of particle collisions will be 


3 
Nactual = a 

The first factor is the fraction of space actually filled with machinery; it is in the 
tenant’s interest (but none of the landlord’s business) to fill the rented space as fully 
as possible. The second factor tells how much smaller the unit action cell is than 
the effective collision size (in phase space) achieved by the user, namely, particle 
momentum times particle radius; again, it is in the tenants’s interest (but none of 
the landlord’s business) to make this ratio as close as possible to unity. Thus the 
fair rental fee, in Planck’s unit of account, coincides with the maximum number of 
collisions the tenant can achieve. 


Another argument, due to Margolus and Levitin {16}, runs as follows. The aver- 
age energy & of a quantum system coincides with the maximum rate of dynamical 
evolution, i.e., with the rate at which the system can successively go through a 
sequence of orthogonal states. Thus the quantity N = ET, expressed in Planck’s 
units of account, is the maximum number of distinct states that can be touched 
during a computation. Since some quantum states may be strung by the dynamics 
into orbits much shorter than N (say, an orbit of period 3), the average number of 
distinct states touched by the computation will be less than N. It will be the user’s 
responsibility to design a dynamics with high enough ergodicity (cf. §21.4) to take 
full advantage of the energy invested in it—a computer that will not enter a cycle 
too soon. 


However, dimensional arguments of this kind are too generic for our purposes. 
We would like to argue not only that action, as energyxtime, is the right yard- 
stick to measure ‘amount of computation’ with, but that a very specific quantity 
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having the dimension of action, namely, that obtained by integrating a system’s 
Lagrangian over time, corresponds to the amount of computational work performed 
by the system during its time evolution. In turn, we'd like to define this ‘computa- 
tional work’ as the log of the number of different behaviors that the system could 
have exhibited—just as the information content of a message is defined as the log 
of the number of different things that the message could have said. While entropy 
measures uncertainty as to state, we'd like to argue that (Lagrangian) action mea- 
sures uncertainty as to law. Thus, instead of counting states we will be counting 
laws. The next section is meant for flexing our muscles in this respect. 


One might say that the job is already done. In fact, Feynman’s theory of path 
integrals in quantum mechanics [17] provides a satisfying mathematical model of 
why a classical trajectory should satisfy a principle of stationary action: action af- 
fects quantum phase and thus quantum amplitude, quantum amplitudes reinforce 
or cancel one another according to their phases, the square of the overall amplitude 
gives a probability, and a sharply peaked probability gives a classical trajectory. 
The implication of this approach, however, is that the variational principles of clas- 
sical mechanics can be explained only after the introduction of quantum mechanics 
(which, by the way, is still in search of an “explanation” ). Though it may turn out 
that quantum mechanics is an essential element of the explanation, I rather suspect 
that the connection between variational principles and counting arguments (which 
is so solidly established in statistical mechanics) is much more general than that, 
and may hold at least in a qualitative way in analytical mechanics without bringing 
in quantum mechanics. Richard Feynman himself, even after having stalked action 
at closer distance than any other mortal through his theory of path integrals, would 
still say “I don’t know what action is”. Apparently, there are more veils to be lifted. 
By contrast, today we can confidently say that “we know what entropy is.” 


21.8 T=dS/dE holds for almost any system 


Statistical mechanics gets distinguished macroscopic properties (e.g., equilibrium 
parameters) by applying counting arguments to ensembles of microscopic states. 
Here we give an example of how general properties of analytical mechanical laws 
may arise by applying counting arguments to “ensembles” of microscopic laws. 


In re-reading Arnold [18], I started wondering about the “philosophical meaning” 
of the following. Consider a continuous system with one degree of freedom. Let T 
be the period of a given orbit of energy EH, and dS the volume of phase space swept 
when the energy of the orbit is varied by an infinitesimal amount dE (Fig. 21.14). 
As is well known, if the system is Hamiltonian these quantities obey the relation 


T = dS/dE, (21.5) 
Just how surprising is this fact? 


One way to answer this question is to go to a more accurate physical level 
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E+dE 


Fig. 21.14. In moving from an orbit of energy F to an infinitesimally near orbit of energy 
E + dE, one sweeps an area dS of the phase space. If the dynamics is Hamiltonian, the 
orbit period T equals the ratio dS/dE of action-to-energy variation. 


(quantum mechanics) and see how energy and rate of evolution are related there 
(see [16]). Here instead our approach to the question is to consider systems of a 
more general nature, but for which quantities analogous to T, dE, and dS are still 
meaningful. Under what conditions will the above relation hold for these systems? 


Take, for instance, the class Vn of all discrete systems having a finite number 
N of states and an invertible but otherwise arbitrary dynamics. Though contin- 
uous quantities such as those appearing in (21.5) may arise from discrete ones in 
the limit N — oo, in general relation (21.5) will not hold (or even be meaningful) 
for every individual system; however, if one considers the entire class, one may ask 
whether this relation holds approximately for most systems of the class. Alter- 
natively, one may ask whether this relation holds for a suitably-defined “average” 
system—treated as a representative of the whole class. This kind of approach is 
routinely used in statistical mechanics;? in our context, however, statistical meth- 
ods are applied to “ensembles” in which the missing information that characterizes 
the ensemble concerns a system’s Jaw rather than its initial state. 


We shall show that relation (21.5) holds for the average element of the “ensem- 
ble” Xn. Now, this is surprising, as we thought we hadn’t told Yn anything about 
physics! 


The systems of class Vn have very little structure—basically, just invertibility. 
Nonetheless, one can still recognize within them the precursors of a few fundamental 
physical quantities. For instance, the period T of an orbit is naturally identified 


°For example, given a canonical ensemble for a system consisting of an assembly of many 
identical subsystems, almost all elements of the ensemble display a subsystem energy distribution 
that is very close to the Boltzmann distribution; the latter can thus be taken as the “representative” 
subsystem-energy distribution, even though hardly any element of the ensemble displays that 
distribution exactly. 
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with the number of states that are strung along the orbit. Likewise, a volume S of 
state space, which in our case is an unstructured “bag” of states, will be measured 
in terms of how many states it contains. It is a little harder to identify a meaningful 
generalization of energy; the arguments presented at the end of this section suggest 
that in this case the correct identification is E = logT, and this is the definition 
that we shall use below. Armed with the above “correspondence rules,” we shall 
investigate the validity of relation (21.5). 


Each system of Vn will display a certain distribution of orbit lengths; that is, 
one can draw a histogram showing, for T= 1,... , N, the number n(T) of orbits of 
length T (see Fig. 21.15). 


nT), 


123456789:.- 


Fig. 21.15. Orbit-length histogram of one particular system. The dashed curve gives the 
average histogram over the entire set of N! systems. 


If in this histogram we move from abscissa T' to T + dT we will accumulate a count 
of n(T) dT orbits. Since each orbit contains T points, we will sweep an amount of 
state space equal to dS = T n(T) dT; thus 


dS 
7 T n(T) 
On the other hand, since E = logT, 
dT 
dB" 


hence 


ee os °n(T) 
dE d?db° .” 


Therefore, the original relation (21.5) will hold if and only if the orbit-length 
distribution is of the form 


n(T) = 1/T. 


Do the systems of 1’n display this distribution? 


374 TOMMASO TOFFOLI 


Observe that, as N grows, the number of systems in 1’n grows much faster than 
the number of possible orbit-length distributions: most distributions will occur 
many times, and certain distributions may appear with a much greater frequency 
than others. Indeed, as N - ov, almost all of the ensemble’s elements will display 
a similar distribution. In such circumstances, well-known theoretical and practical 
considerations recommend defining the “typical” distribution as the mean distribu- 
tion over the ensemble, denoted by n(T). 


It turns out that for jy the mean distribution is exactly 
nn(T) =1/T (21.6) 


for any N, as indicated in Fig. 21.15. In fact, we construct a specific orbit of length 
T by choosing T states out of N and arranging them in a definite circular sequence. 
This can be done in (7) different ways. To know in how many elements of the 
ensemble the orbit thus constructed occurs, we observe that the remaining N — T 
elements can be connected in (NV —T)! ways. Thus, the total number of orbits of 
length T found anywhere in the ensemble is 


) n= 


Divide by the size N! of the ensemble to obtain 1/T. 


Thus, the typical discrete system obeys relation (21.5). Intuitively, when N is 
large enough to make a continuous treatment meaningful, the odds that a system 
picked at random will closely obey (21.5) are overwhelming. 


Why E = logT 


Here we motivate the choice & = logT made above. 


Finite systems lack the rich topological structure of the state space found in 
analytical mechanics. Beside invertibility, in general the only intrinsic!® structure 
that they are left with is the following: Given two points a and b, one can tell 
whether b can be reached from a int steps; in particular (for t = 0), one can tell 
whether or not a= b. Thus, for instance, one can tell how many orbits of period T 
are present, but of these one cannot single out an individual one without actually 
pointing at it, because they all “look the same”. 


To see whether there is a quantity that can be meaningfully called “energy” in 
this context, let us observe that physical energy is a function F, defined on the state 
space, having the following fundamental properties: 


10That is, independent of the labeling of the points, and thus preserved by any isomorphism. 
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1. Conservation: E is constant on each orbit (though it may have the same value 
on different orbits). 


2. Additivity: The energy of a collection of weakly-coupled system components 
equals the sum of the energies of the individual components. 


3. Generator of the dynamics: Given the constraints that characterize a partic- 
ular class of dynamical systems, knowledge of the function £& allows one to 
uniquely reconstruct the dynamics of an individual system of that class. 


The proposed identification & = logT obviously satisfies property 1. 


As for property 2, consider a finite system consisting of two independent compo- 
nents, and let a@p and a; be the respective states of these two components. Suppose, 
for definiteness, that ap is on an orbit of period 3, and a; on one of period 7; then the 
combined system state (do, a; ) is on an orbit of length 21, i.e., log T = log To+logT). 
This argument would fail if Tg and 7; were not coprime. However, for two randomly 
chosen integers the expected number of common factors growsextremely slowly with 
the size of the integers themselves [19] (and, of course, the most likely common fac- 
tors will be small integers); thus the departure from additivity vanishes in the limit 
T > oo. 


As for property 3, an individual system of 4’n is completely identified—up to 
an isomorphism—by its orbit distribution n(T), and thus any “into” function of T 
(in particular, F = logT) satisfies this property. 


21.9 The power of conditioning 


Much as in thermodynamics a “state”—i.e., a macroscopic state—is in fact a bag 
of microscopic states, we shall entertain the notion that, in analytical mechanics, 
a “trajectory” of our dynamics is really a bundle of microscopic trajectories of an 
underlying, fine-grained dynamics. How does the size of this bundle depend of what 
we know about the (macroscopic) trajectory? 


Let us consider a particle that at every tick of the clock can move one notch right 
or left on a one-dimensional track. The particle is driven by a computer program. 
To be specific, there is a given computer with a large amount of memory in it; 
somebody initializes the memory and then lets the computer run. They fix their 
attention on a specific data bit. Every billionth cycle of the computer they look at 
the state of this bit: if it is a 1 they will move the particle to the right; if a 0, to 
the left. At time 0 we put the the particle at notch 0 and leave the room. At time 
T (say, some 10,000 ticks) we are asked, “Where is the particle now?” 


How could we possibly know? We know the computer but we do not know what 
program it is running or what the initial data were. Actually, we know one hard 
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fact: at time T the particle must be within the interval [-T,+T], since it cannot 
move more than one notch at a time. Beyond that, what can we say? If hard 
pressed, we may hazard a guess that the particle is probably closer to the middle 
of the interval than to the very ends. 


They let us go. At time 2T they are at it again. This time they tell us where 
the particle is now: at a certain position 2X (with, say, X = 8000); but they ask 
for the same information as before, namely, “Where was the particle at time T?” 
And, at gunpoint, “A bad answer, and you are dead!” 


Let us think rationally. We could simulate the given computer on our laptop 
PC, trying out all the programs one by one. Perhaps, by the way the computer is 
designed, there are some positions where the particle can never be at time T no 
matter what program is running; that might help us narrow down the choice by a 
tiny bit. But the computer may have a billion bits, and so may be running any 
of 21:000,000,000 ,rograms; this is not a promising approach, since we'll certainly be 
dead before we've tried out even a small fraction. 


Then we realize that, no matter how many programs there might be, the number 
of possible paths the particle may have followed from the start to time T is “only” 
27. Forget the red herring of the 2!:900,000,000 ,rosrams: our chances of survival 
are no worse than one in 27, and T is in the thousands rather than in the billions. 
Even better, the number of places where the particle can be at T (rather than the 
paths from 0 to T) can only grow linearly with T—in fact, it is no more that T +1. 
That is, our chances cannot possibly be worse that one in T (~ 2!*)—even if the 
particle were equally likely to be in any of the possible slots. With a not too large 
T in the denominator, perhaps we can put some large constant in the numerator 
and manage to get an appreciable chance of survival. Let us do some figuring, but 
without assuming anything more than we know. 


The worst scenario—and, by Murphy’s law (or Jaynes’s law [20])—the one that 
we owe it to ourselves to entertain unless we want to deceive ourselves, is that all 
particle paths compatible with our information are equally probable. Forget about 
the computer program, about which we do not know anything besides that it is 
large, and concentrate on the kinematic constraints. For a specific position x, how 
many paths go from event Pp (particle at the origin at time t = 0) through event 
P, (particle at position x at time T) and event P» (particle at position 2X at time 
2T)? 


In general, if the net progress during a time interval t is z, and this was done 
by x4 steps to the right and z_ to the left, we must have 


t++7r_ =f, ty =(t+z2)/2= 18 t, 
mn 18 
T+ —L- =, t_ =(t—2)/2= ma 


where we have called @ the average velocity z/t in that interval. Then, the number 
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of paths is!? 


t 
N(8,t) = (20, 1-6,): (21.7) 


Denote by H(p) the binary entropy function 


H(p) = —plogp — qlogq, 


that is, the entropy of the binary distribution {p,q}. This entropy can be rewritten 
more symmetrically in terms of the the mean pu = p — q of the distribution, as 


K(u) = H(p(u)), where p= (1+ p)/2. 
Using Stirling’s approximation, we get from (21.7) 
1 
— log N(f,n) = K (A) + O( 82); (21.8) 


the “big-oh” term in (21.8) vanishes as n - oo. Some salient features of K(G) are 
shown in Fig. 21.16. 


—— * + constant 
 ] 


1 — B? - consti 


—1 0 Bo +1 


Fig. 21.16. The binary entropy A (G) compared with the parabola osculating it at the 
apex, and with the relativistic factor ,/1 — 6? (a half-circle). 


If we call 6 the average velocity of the overall trip (from 0 to 27), 6, that of 
first lap (from 0 to T), Go that the second lap (from T to 27), and « the excess of 
3, over B, we have 

B= X/T, fi = B+e, 


e=(x—-—X)/T, ae Bo = B-«. 


“If n =h-+ k, we may write (,”,) for ({) to stress the symmetry between h and k. 
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Coming back to our question, the number of paths from Po to P) through P is 
Noi2 = Noi M2 = N(61,T)N(62,T) 
= N(6+e,T)N(G-€,T), 
while that from Pp to P2 (with no restrictions on intermediate points) is 
No2 = N(G, 2T). 


Thus, the proability P that the particle went through P,, conditioned by the 
knowledge that it started at Po and ended up at Py, is 
_ Nai _ N(B+¢,T)N(6—-€,T) 


Noo N(, 2T) 

Taking logs and dividing by the number of steps as in (21.8) we obtain a quantity 
R(€) proportional to the log of the relative frequency of the paths going through z 
at T’, namely, 


73) 


R(e) = K(6+e)+ K(G—-—6) —2K (8). 
Now, it turns out that, for small (and even not so small) values of 6, 
K (8) = -46? + const (21.9) 
(cf. Fig. 21.16). Thus, up to an additive constant, 
R(e) © —[(6 + €)? + (6 — €)? — 267] /2 = -e? /2. 


The maximum for R is clearly at dR/de = 0, which occurs for « = 0, that is, for 
8, = B2 = 8. Thus we conclude that the most likely place for the particle to be at 
time T is X, i.e., halfway between its initial position 0 and its final position 2X, as 
if it had traveled at a uniform velocity @ = X/T’. In other words, the most likely 
position for event P», to be is on the straight spacetime line between Po and P;, as 
shown in Fig. 21.17. This generalizes to any choice of initial and final positions, and 
of any instant of time for which the position of the particle is guessed (Fig. 21.18). 
Incidentally, with the given data, the standard deviation from this straight line at 
midtime is about 40 notches; with our guessing system, the probability to come out 
alive is about 0.014, or better than one in a hundred. 


Note that we never suggested that the particle must have had some “momentum” 
that made it try to preserve its “speed”, or that, since the particle moved to the 
right 9 times out of 10, the computer program, seen as a random-number generator, 
must have been one characterized by a certain biased “statistics”. We did not make 
hypotheses about where the particle was when we were not seeing it; all we used 
was the little we were told, namely, where the particle was at t = 0 and ¢t = 27, and 
that it was hopping right or left one notch at the time (the latter is a crucial piece 
of information). Our prediction is the “flattest” that one could make out of the 
available data, and thus is just a tautological rewording—though an illuminating 
one—of these data. 
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Fig. 21.17. For a particle that performs a symmetric random walk, starts at Po = (0,0), 
and ends up at P2 = (2T,2X), what is the spatial probability distribution at t = T? All 
paths are comprised within the forward light-cone from Po and the backward light-cone 
from P2; thus, the range of positions at T is restricted to the solid line in the middle. The 
particle’s mean velocity from Po to P2 is @ = X/T; the mean for the subpath PoP, is fi; 
that for P,P2, G2. The requested distribution has both mean and maximum at z = X 
(hollow dot). 


‘ m 


Fig. 21.18. For any given time ¢t, the probability distribution for the particle to be at z 
has a mean indicated by the line of slope 6 and a standard deviation o indicated by the 
width of the surrounding ellipse. The figure on the right uses the data from the example 
(T = 10,000, X = 8,000); even when showing a 5o¢ deviation, the width of the ellipse is 
barely noticeable. 


21.10 On two levels?” 


You may think it impertinent that I took the random walk—no inertia, total de- 
pendence on external whims—as a model of the motion of a free particle governed 


12This phrase is borrowed from Kadanoff [21]. 
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purely by inertia. We certainly need to test these ideas with more realistic mod- 
els of dynamics. To this purpose, and in order to establish a link between action 
and amount of computation, we shall introduce a computer-like system to which 
the Lagrangian and Hamiltonian formalisms can be applied verbatim. By mov- 
ing back and forth between the computational and the physical interpretation of 
this system we will be able to establish correspondence rules between physical and 
computational constructs. 


21.10.1 Chains and strings 


Consider a linear chain of dots running along the z-axis with two dots for every 
unit of length (Fig. 21.19). The displacement of a dot from a horizontal reference 
line will be recorded along the g-axis. A dot is connected to its two neighbors 
by links of slope +1. Thus, on a macroscopic scale the chain will appear as a 
continuous string with slope in the z-q plane never exceeding +1. 


q 


Ove = 


Fig. 21.19. A chain with links of slope +1, stretching along the z-axis and with dis- 
placements recorded along the g-axis (top). Macroscopically, the chain will look like a 
continuous string with slope in the z-q plane never exceeding +1 (bottom). 


For brevity, coordinates having integer values (1,2,3, ... ) will be called even; 
those having half-integer values (5, 3,3...), odd. We are interested in the class of 
chain dynamics obeying the following constraints (Fig. 21.20): 


e At even times, the candidates for a move are the dots at even places (solid 
dots); at odd times, those at odd places (hollow dots). 


e If a candidate can hop up or down one unit in the g direction while retaining 
links of slope +1 with its neighbors, then it is up to the specific dynamics to 
decide whether it will do so (flip). Otherwise, the candidate will remain in 
place (rest). 
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Thus a dynamics is just a table that, for any qg,z,¢ that are all three odd or all 
three even, specifies whether the candidate dot is permitted to flip. Whether a dot 
will actually flip depends not only on whether the adjacent links allow it to flip, but 
also on whether the permission bit allows it to take advantage of this possibility. 


q 


bv 


Onl 


051 2 3 zr 


Fig. 21.20. Two successive configurations of a discrete chain with links of slope +1. At 
even steps, the even dots (solid) try to move; at odd steps, the odd dots (hollow). The 
arrows indicate the moves that are possible without breaking the links. A possible move 
will actually be made (the dot will flip) only if the dynamics gives permission; otherwise 
the dot will rest. In this figure, permission was not granted at t = 0 for the dot at z = 3, 
so that at time t = 5 we find it at the same position. 


1 
2 
1 
2 


t= 


oO 


t= 


Fig. 21.21. At its finest level, the program of the chain computer is just a binary table 
containing a permission bit for each point (q,z,¢) such that all three coordinates are even 
or all odd (“body-centered cubic lattice”). Each frame display, in an interleaved fashion, 
data for two consecutive values of t (even ¢ at even g, x; odd ¢, at odd). For g < 2, this 


dynamics always grants permission; for g > 2, it grants permission on two half-steps and 
then witholds it for two more half-steps. 


One can think of the present setting as a special-purpose computer; the data are 
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the states (+1) of the chain’s links; the program is the permission table itself. That 
is, we view spacetime (here including for convenience gq as another spatial coordi- 
nate) as a read-only memory containing a bit at each point. The processor is just 
the mechanism that interprets the data as a linked chain and the program as a dy- 
namics, flipping a dot if and only if its links (the general kinematic constraints) and 
the permission bit (the specific dynamics) permit it. Note that this is a reversible 
computer, which can be operated indifferently in “forward” or “reverse”.!3 


A sequence of chain configurations compatible with a given dynamics is a tra- 
jectory of that dynamics. A sequence that just obeys the kinematic links will be 
called a history (note that a history is a trajectory of some dynamics). 


What kinds of behavior can such a chain display as one ranges over the possible 
dynamics? At one extreme, if the dynamics never grants permission, then no flips 
can possibly occur: in the zdentity dynamics, any initial configuration remains 
forever unchanged. At the other extreme, suppose that permission is always granted. 
Then one can immediately verify that a configuration consisting of all +1 links 
(Fig. 21.22a) or all —1 links cannot move at all. On the other hand, a configuration 
consisting of regularly alternating +1 and —1 links, and thus having an average slope 
of 0, will steadily march vertically at unit speed, moving upwards or downwards 
depending on whether the black dots are in the valleys or on the peaks at even 
times(Fig. 21.22c,d). In turns out that, for an arbitrary chain configuration, this 
permissive, “flip-whenever-you-can” dynamics gives exact wave equation behavior; 
this will be proved in §21.10.2, where we also discuss other dynamics that can be 
“programmed” in our chain computer. 


(a) (b) (c) (d) 


Fig. 21.22. Examples, when permission to flip is always granted: A chain with slope +1 
(or —1) cannot move at all (a); a kink on it will propagate at unit speed (b). A chain with 
alternating +1 and —1 links will march at unit speed upwards or downwards—depending 
on whether the black dots start in valleys or on ridges (c,d). 


In sum, we have defined a whole class of dynamics; the general format is given 
by the kinematic constraints (try to flip at even or odd places at, respectively, even 
or odd times, and only if the links allow it), while the specifics is given by the 
permission table (flip only if the permission bit is set). 


13More than that, given appropriate interlocks it can go forward, backward, or idle independently 
at each point of the chain. 
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21.10.2 Means and flows 


To get from a discrete chain to a continuous string, so as to get differentiable 
quantities which we can use in variational arguments, we must go to a macroscopic 
level; i.e., consider space averages over lengths that are large with respect to the 
lattice spacing, and time averages over a large number of steps. Ideally, coarse- 
grained averaging should commute with the dynamics; that is, if we average the 
microscopic state variables and then let this average evolve for a certain time using 
an appropriate macroscopic dynamics we should get the same result as if we had let 
them evolve over the same time using the microscopic dynamics on the microscopic 
data and then done the averaging. 


At the microscopic level, knowledge of the positions q(x) completely specifies 
the system’s state; that is, the assignment of q(x) at a given instant completely 
determines the subsequent microscopic evolution of the system under a given dy- 
namics. More specifically, initial rate-of-change information—which dots are going 
to flip at the initial step—is not part of the initial specifications but is dictated by 
the dynamics itself. 


The situation is different at the macroscopic level. As a smeared-out, macro- 
scopic state variable, g(x) does not capture enough information about the state of 
the system to uniquely determine its macroscopic evolution. In fact, as one can see 
in Fig. 21.22c,d, two chains with the same q(x) may move one up and the other 
down! More macroscopic data than just q(x), namely, the rates of change 0q(z, t) /Ot 
(or some equivalent information) are needed to give a self-contained macroscopic 
dynamics, and, as we shall see, they are indeed sufficient. 


Coming back to the microscopic level, let us fix our attention on one candidate 
dot (Fig. 21.20) and the unit-space cell surrounding it; this cell contains two links, 
one on the right and one on the left of the dot itself, whose values we shall call 
respectively p_, and pp. From these we construct, by symmetrization, the new 
quantities 


p= (p+ + p-)/2, 


i akin (21.10) 


On a unit-cell scale, the possible values for p and j are 


Thus, along the chain, we have the two microscopic sequences of links, p_,(x) and 
p(x), and the two derived sequences p(x) and 7(z); for example, for Fig. 21.20 we 
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have 


(21.11) 


p| +i] 0 | +t} 0 | -i[ 0 
jUO [+i] 0 | -1] 0 f -1 | 


Under coarse graining, p(z,t) represents the average slope of the of chain (in 
the z-g plane) while —j(z,t) represents its average velocity (the “slope” in the t—q 
plane): 


Og _ p+ + Pr 
Ox _ 2 } 

21.12 
Og _ Pa Pe Ce 
Ot 2 


To integrate the dynamics we have to determine the evolution of p and j, or, equiv- 
alently, or p-, and p. It turns out that if permission to flip is always granted, then 
the two sequences p-_, and p, of (21.11) shift respectively rightwards and leftwards 
one position at every step without interacting. In other words, the dependency on 
g and ¢ is such that 


po (x, t) =p (x aa t), 


p(x, t) = pe_(x tb). (21.13) 


In this case, then, the sequence p can be thought of as the superposition of two 
traveling strings that move at unit speed in opposite directions. As we know, this 
is the general of the one-dimensional wave equation. Thus, we conclude that under 
this “permissive” dynamics the string q(x, t) strictly obeys the one-dimensional wave 
equation 


ee eet), (21.14) 


on any scale, with no damping whatsoever.!4 


If permissions are assigned randomly and independently with a density 1 — s, 
then, with reference to Fig. 21.23, where the two trains p_,, 9 are shown running 
on opposite tracks, when two cars pass by one another they will swap contents with 
probability 7. The result is that, for small 7, these two quantities are related by 


tkp+ = —0,p4 — s(p — pr), 


(21.15) 
Op = —Ozp- + 8(05 — pr), 


14Norman Margolus and I came upon this model around 1983. On a computer simulation, it 
is impressive to see the full interplay of elasticity and inertia emerge from such simple discrete 
primitives, 
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and thus p, j, and q all satisfy the telegraph equation 
OnsU = Ont = 230;u, 


which is essentially the wave equation with damping proportional to s. For fixed s, 
if we scale t by a and z by ,/a, then in the limit as k > oo the telegraph equation 
turns into the diffusion equation, 

1 


O;u = 95 zru. 


Nonrandom tables, and especially tables with a fractal structure, need not lead 
to an emerging macroscopic dynamics. But in all cases we get a linear dynamics, 
since the p_, and p_ tokens of Fig. 21.23 are switched without even being looked 
at—they do not interact! Note that, in a measure-theoretical sense, almost all 
permission tables yield a uniform and independent distribution of permissions with 
density 1/2. In this sense, the typical dynamics of our class is that of a heavily 
damped string. 


Fig. 21.23. The sequences p., and py may be thought of as trains traveling in opposite 
directions at unit speed. A request to flip (in the original chain) corresponds to two train 
cars passing by one another; if the permission is denied, then the two cars swap contents, 
so that a bit that was traveling rightwards now starts moving leftwards and vice versa. 
This leads to scattering in the p_,—p. picture, corresponding to damping in the q picture. 


21.10.3 Inertia of the lumped string 


We shall now focus our attention on the harmonic string behavior, as supported by 
the permissive dynamics in our toy computer. 


Let us close the string into a loop of length m (“periodic boundary conditions” 
q(x-+m) = q(x)), and consider the position of its center of mass (on the g-axis) as a 
function of time. Whatever the configuration of the string, this point will move at 
a constant velocity 6 (since { j dz is strictly conserved) in the range —1 < 6 < +1, 
with 2n + 1 discrete possible values 


B=i/m, fort =—m,...—1,0,+1,...+m. 
We will have 


p4tpe =), p+ = -B, 
or 21.16 
p+ — pe = —28, | 


> 

T 
II 

DB 
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Of the m elements of the train p_,, a fraction p+, will consist of +1s and the rest, 
p_,, of —1s, that is, 
+ — _ 
Pp + p =? 1, 
ae (21.17) 
p+ — Pp. =p, 


and similarly for p.. Thus, the number of configurations compatible with velocity 
B is (using the same notation as in §21.9) 


_{m m by = (1 - 8)/2, 
ie oe lea eps = (1+8)/2, a) 
or 
InN x 2mK (8) =» —mB + const, (21.19) 


which is essentially the same dependency on £ as (21.9). This will of course give the 
correct results as the Lagrangian of the “lumped-string” free particle; moreover, the 
appearence of a mass factor m makes it possible to study the interactions between 
particles of different masses!* and see how the number of trajectories varies as a 
function of different masses and velocities. 


21.10.4 Refraction 


Here we’ll briefly consider the case of a nonhomogeneuos medium. If the permission 
table witholds permission for the two consecutive steps that make up a time unit 
(cf. top part of Fig. 21.21), then the evolution of the chain is frozen for that time 
unit. These “suspensions” may be randomly scattered through time, reducing the 
effective speed by a factor of n (see [9] and [22] for examples of this approach in 
lattice-gas dynamics). Suppose (Fig. 21.24) that the particle starts at the origin, 
traverses free space (“index of refraction” k = 1) until time t, and then traverses a 
stretch of time with “index of refraction” k = n, finally landing at Q. What is the 
most likely value for g when the particle enters the denser medium? Equivalently, 
what is the “refraction angle”? 


We will derive the Lagrangian directly from the microscopic dynamics, always 
assuming that the Lagrangian must be an indicator of the number of trajectories 
compatible with the given data. In the denser medium, whatever the macroscopic 
velocity uv, the particle must be following a sequence of configuration appropriate for 
a velocity v' = nv, with the only difference that: on time slots when permissions are 
witheld the current configuration will be held frozen. Thus, in the denser medium, 
the Lagrangian must reflect the path statistics of this higher “internal” velocity 
rather than that of the apparent velocity v. 


15Interactions between chains require a slightly more complex cpu for our computer, able to 
look further than just first neighbors when considering a flip. 
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Fig. 21.24. A lumped-string particle travels until ¢ through a 100% permissive medium 

“vacuum” ); thereafter, the permission rate is granted only every nth time slot (“index 
of refraction” n). If the average velocity in the denser medium is v, the string must be 
moving at an actual speed v’ = nv when moving at all, and the path statistics that enters 
in the Lagrangian must be that corresponding to this higher speed v’. 


Multiplying the velocity by a factor of n in (21.16) shifts the statistics in (21.18) 
so that a factor of n? appears in (21.19); rescaling time by a factor 1/n, and then 
moving this factor from the dt term to the L term in LdT adds to the Lagrangian 
a factor 1/n, so that the correct Lagrangian for index of refraction n is 

ae ; 
L,(v) = 7m Lvacuum() =nmv’. 
This expression for the Lagrangian indeed gives, upon extremizing the action inte- 
gral, the correct actual trajectory with g/t = n(Q — q)/(T —t). 


21.10.5 The string as an extended system 


We shall now look at a chain of indefinite length undergoing harmonic motion as a 
spatially extended system. 


If we denote by K, U, H, and CL the densities of respectively kinetic energy, 
potential energy, total mechanical energy (Hamiltonian), and Lagrangian, the cor- 
respondence with analytical mechanics for the harmonic chain, given by (21.12) and 
(21.14), is completed by setting 


K=7’, H=K+U, 
d 21.20 
7) eae enn oo 6,7 ee 
whence 
.f ry 1 v € 
H=j +p =5 (e+e); (21.21) 


L=jf-—p?=-pape. (21.22) 
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We can now go back and forth between the two levels, and find the microscopic 
interpretation, in this computational model, of the usual macroscopic concepts. 
For example, the quantity p represents the amount of stretch of the chain from 
the flat configuration in which links are randomly oriented up and down—and its 
square represent the energy stored in this “spring”. The quantity 7 represents the 
momentum of the string (the difference between the number of the “valleys” , where 
the chain moves up, and that of “peaks”, where it moves down), and its square 
represents the energy stored in the “inertia”. The sum of these squares is of course 
the total mechanical energy H, which from the rightmost equality in (21.21) can 
also be seen as the sum of the energies of the two traveling waves p., and p,. 


For a combinatorial interpretation of H, consider a piece of chain of length m 
and proceed as in (21.18), but with p+, and p= more generally given by 


+ =(1+ 2, 
. (1+p+)/ (21.23) 
Pi= (l+ )/2 
We obtain 
m m 
We ta Ea 
or 


1 
7 logN = K (p+) K(p.-) 
1 
7 a + p?_) = -H + constant. 


Thus, in this model of the elastic string, the energy | H(zx)dz measures, on a 
log scale, the number of microscopic chain configurations compatible with a given 
macroscopic assignment of positions and velocities. A low-energy state is “cheap” 
because it is “common”—there are so many ways to achieve it. Conversely, high- 
energy states are “rare”. In fact, the four states of maximal energy (j = 0, p = +1, 
as in Fig. 21.22a; or 7 = +1, p = 0, as in Fig. 21.22c,d) are each represented by a 
single microscopic configuration. 


Since the underlying cellular automaton is microscopically reversible, its fine- 
grained entropy is strictly constant—rare states map into rare states, common ones 
into common ones; in this model, thence, energy conservation is just a macroscopic 
expression of microscopic reversibility. 


Note that H depends on the coarseness of graining; vibrations of a wavelength 
shorter than this grain do not contribute to H, and may be viewed as thermalized 
degrees of freedom. 


From (21.22), the wave equation (21.14) follows immediately by the Euler— 
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Lagrange equation 


aes oo i (21.24) 
dt OF a O5n 
By (21.22), £ = —pp_. Expanding p_,p_ by (21.17) yields 

pape = (eptet + psec) - (phee +e5ee)- (21.25) 


In the above equation, the four terms in parentheses represent the probabilities that 
two consecutive chain links, the first from p_, and the other from p,, form 


pt,pe a+ slope 
PP, a-—I1slope 
+ p- id 
pip. aridge 
pipe avalley 


(cf. Fig. 21.22). 


If a chain’s evolution obeys the permissive rule, a ridge yields a downward flip 
(cf. Fig. 21.20); a valley, an upward flip; and +1 slopes yield a rest. Thus, on any 
small patch of a proposed macroscopic spacetime history, 


Lactual = density of flips — density of rests (21.26) 
= 2(density of flips) + constant. | 


Consequently, given spacetime boundary conditions for the macroscopic string con- 
figuration (e.g., the entire string at to and t,), the histories that are actual solutions 
of the permissive dynamics are those that fill in the intervening spacetime area with 
the least number of flips. 


One might be tempted to say, “Aha! Here is an example of nature’s parsimony!” 
and stop here. But, of course, with a different rule the actual trajectory may not 
be that which minimizes the number of flips. Something deeper and more general 
is going on, which is better seen by turning from the figure to the background. 
A trajectory that minimizes the number of flips is by the same token one that 
maximize the number of rests. Now, in this dynamics, the rests points are all and 
only those spacetime points where no motion was possible to begin with, because of 
kinematic constraints. Thus a rest is a point where witholding permission would not 
have made any difference! For any particular rest point in any particular history, 
a table with a 0 at that point instead of a 1 would have yielded the same history; 
consequently, that history is also a trajectory of this variant table. The actual data 
flow is that which mazimizes the number of possible tables that could have given 
the same data flow—in other words, it is the behavior that is mazimally indifferent 
to the rule. If the microscopic rule is not known precisely or, equivalently, is subject 
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to noise, those macroscopic behaviors that are less dependent on the details of the 
rule will be more stable and will recur more often. 


We have seen that, in the Hamiltonian, an increase in either potential or kinetic 
energy leads to fewer available states. In the Lagrangian, the potential energy en- 
ters with the opposite sign as in the Hamiltonian. If we were looking for maximum 
number of states, that would lead us in the wrong direction. But states of high U/ 
are rare because they are special, and they are special in that they have high sym- 
metry in relation to the dynamics; for this reason, a greater number of microscopic 
implementation of the dynamics leaves them unaffected.!® 


21.11 Conclusions 


If we extrapolate from the above examples we are led to the following ideas: 


e The Lagrangian, which is a function not only of the independent parame- 
ters x and t but most essentially of the dependent variables q and their first 
derivatives, measures the multiplicity of microscopic laws that are consistent 
(a) with an agreed-upon set of kinematic constraints, (b) with the specified 
dynamics, which must be viewed merely as a specification of macroscopic 
trajectories, and thus encompasses a whole ensemble of possible microscopic 
dynamics, and (c) with the proposed segment of macroscopic trajectory. The 
two terms U and K express different kinds of contribution to this multiplicity 
count. 


e A segment of trajectory with high potential energy U is one whose shape is 
heavily constrained by the general constraints specifying the class of laws, and 
thus is less affected by the choice of a particular law of this class. 


e A segment of trajectory with high kinetic energy K is one in which much 
computation capacity is spent in transporting data, and thus less capacity 
remains for bringing about internal evolution of the data themselves. 


The action integral measures the computation capacity (multiplicity of micro- 
scopically accessibly paths under the above constraints) offered by the sub- 
strate to a proposed macroscopic trajectory. 


The more this amount of computation capacity exceeds what is strictly needed 
for the proposed macroscopic evolution, the greater the number of microscopic 
ways that the latter can be fulfilled (typically, by wasting extra capacity in 
aimlessly thrashing)—and thus the greater the chances that the proposal will 
be carried out. 

16For an analogy, a circle is more symmetric under rotations than a triangle, and for this reason 


there are fewer circle shapes than triangle shapes. But for the same reason there are more rotations 
that will leave a circle unaffected. 
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In sum, by postulating a fine-grained dynamical substrate underlying the given 
dynamics, one turns analytical mechanics into a form of statistical mechanics, 
though applied to microlaws rather than to microstates. One views the actual 
trajectory as the peak of a distribution of microscopic trajectories, and interprets 
the “principle of least action” as a concise expression of combinatorial tautologies, 
much as the “survival of the fittest” and the “law of large numbers”. 


If all of this stands review, the principle of least action is an expression not of 
nature’s parsimony [2] but of nature’s prodigality: a system’s natural trajectory is 
the one that will hog the most computational resources. Would you have expected 
otherwise? 


Do not read me wrong. I am duly impressed by the enormous power of varia- 
tional principles; but when I use them I would like to know where it is that they 
get their power from (caveat emptor!)—I just don’t believe in magic. 
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ALGORITHMIC RANDOMNESS, PHYSICAL 
ENTROPY, MEASUREMENTS, AND THE 
DEMON OF CHOICE 


W.H. Zurek * 


Abstract 


Measurements — interactions which establish correlations between a system and 
a recording device — can be made thermodynamically reversible. One might be 
concerned that such reversibility will make the second law of thermodynamics vul- 
nerable to the designs of the demon of choice, a selective version of Maxwell’s demon. 
The strategy of the demon of choice is to take advantage of rare fluctuations to ex- 
tract useful work, and to reversibly undo measurements which do not lead to such 
a favorable but unlikely outcomes. I show that this threat does not arise as the 
demon of choice cannot operate without recording (explicitly or implicitly) whether 
its measurement was a success (or a failure). Thermodynamic cost associated with 
such a record cannot be, on the average, made smaller than the gain of useful work 
derived from the fluctuations. 


22.1 Feynman 


When I was asked to write for a volume dedicated to Richard Feynman, I decided 
that I should select the subject in which I was influenced by him the most, and which 
would still be consistent with the overall theme of computation and physics. And 
these influences started well before I met him in person: I got Feynman’s “Lectures 
on Physics” more than a quarter century ago, in Polish translation, from my father. 
As a finishing high school student I was accompanying him on a hunting expedition 
in the lake district of Poland — a remote corner of the country. Every few days 
we drove for supplies to the provincial capital, and there I noticed the volumes 
in the local bookstore. My father asked why (the expense was considerable), but 
surprisingly easily gave way to my arguments. I spent much of the rest of the 
hunting vacation (a couple of weeks altogether) getting through volume I. 


Over the years I have developed a habit of treating the “Lectures” sort of like 
a collection of poems. I like some “poems” more than others, and I return to the 
favorites now and again. And when I am stuck with a physics problem, reading 
a few of the relevant “poems” is often the best way to get “unstuck”. But there 
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are a few chapters which have been read over and over again without any such an 
ulterior motive, for sheer pleasure. Amongst them, I would certainly include the 
discussion of the fluctuations and the second law (the famous “ratchet and pawl” 
argument [1]). 


Thermodynamic concerns and arguments have often pre-saged the deepest de- 
velopments in physics. I suspect this is because thermodynamics “knows” about the 
physical relevance of information, and hence, it knew about the Planck constant, 
stimulated emission, black hole entropy, and so on. When I met Feynman in per- 
son for the first time (at a small workshop organised near Austin, Texas, by John 
Archibald Wheeler in the Spring of 1981), I remember — amongst other things — a 
thermodynamic argument he used to great effect to prove that one cannot acceler- 
ate elementary particles by shaking them together with a bunch of heavier objects, 
so that they could acquire equipartition kinetic energies (and therefore, because of 
their small mass, enormous momenta). This idea (credible at first sight, as it is akin 
to the Fermi acceleration of cosmic rays) was brought up by one of the participants. 
It would not work — Feynman argued — because all sorts of other modes of the 
vacuum would have to get their fair share of energy, creating an equilibrium heat 
bath, with approximate equipartition between all the modes (rather then with the 
energy in the elementary particles one really wanted to accelerate in the first place). 


But that was not the most vivid memory of that first encounter with the man 
whose “Lectures” I had acquired a decade or so earlier. Rather, I remember best 
that he showed up at the first lecture unshaved and uncombed, with dry grass in 
his hair. It turned out that he spent the night outside — apparently, he decided 
the accomodations for the speakers (which were in the posh tennis club) were too 
opulent, returned the key to his apartment at the reception, and decided to “camp 
out”. During the morning coffee he has also reported in detail (and with great 
gusto) how he had trouble breaking the code to get into his briefcase (where he had 
the sweater — it got cold). He knew the code, of course, by heart, but it was middle 
of the night, so he somehow had to dial it in complete darkness. He clearly relished 
the challenge. I do not remember how did he solve the problem, but the flavor 
of the adventure and of his report was very much in the spirit of the “adventures 
of a curious character”. And all of this was a few months after his (first) cancer 
operation. 


I came to talk to Feynman regularly, more or less once a month, during my 
Tolman Fellowship at Caltech (which started in the Fall of 1981), and a bit less 
often for a few years afterwards. I have also sat occasionally in the class on physics 
and computation he taught with John Hopfield. And I remember discussing with 
him (among other subjects) the connection between physics, information and com- 
putation. In fact, this was a recurring theme. For me, it became somewhat of an 
obsession early on — I really liked the universality of Turing machines, the halt- 
ing problem, and the algorithmic view of information. While I was in Austin the 
fascination with these ideas and their possible relevance for physics was reinforced 
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under the influence of John Wheeler. Which brings me, at long last, to the algorith- 
mic information content, measurements, and various thermodynamic demons which 
probe the utility of acquired information. 


22.2 Algorithmic Information Content, Measurement and the 
Second Law 


Maxwell’s demon — a hypothetical intelligent entity capable of performing mea- 
surements on a thermodynamic system and using their outcomes to extract useful 
work — was considered a threat to the validity of the second law of thermodynamics 
for over a century (2, 3]. Feynman was fascinated with the subject, and his discus- 
sion of ratchet and pawl [1] banished forever the “unintelligent” trapdoor version 
of the demon by clarifying and updating the influential argument put forward by 
Smoluchowski [4] much earlier, and in a rather different setting '. However, Smolu- 
chowski’s trapdoor carries out no (explicit) measurements. Therefore, trapdoors 
and ratchets and pawls can be analysed without reference to information [1, 4]. 


The complete Maxwell’s demon should be able to measure, and it (...?; he? 
she?!) should be of course intelligent. Smoluchowski’s trapdoor does not fit this 
bill. Measurements were incorporated into the discussion by Szilard [6], Landauer 
[7], and Bennett [8] who have argued, in a setting involving ensembles of demons, 
that the acquisition of information is only possible when the demon’s memory is 
repeatedly erased, to prepare it for the new data. The cost of erasure eventually 
offsets whatever thermodynamic advantages of the demon’s information gain might 
offer. This point (which has come to be known as “Landauer’s principle”) is now 
widely recognised as a key ingredient of thermodynamic demonology. This originally 
classical reasoning has been since extended to quantum physics [9, 10], and may even 
be experimentally testable [11]. 


However, the widespread fascination with Maxwell’s demon is ultimately due to 
its intelligence. A demon will record a specific outcome of the measurement, and — 
using its intelligence — will try to make an optimal decision about the best possible 
action, which would maximize the work extracted from a given recorded phase space 
configuration. This is very much the course of action we take (although, fortunately 
for us, in a far-from-equilibrium setting). How can one convince an intelligent demon 
that, all cleverness notwithstanding, its attempts at defeating the second law are 
doomed? This is hard to accomplish at the level of ensembles: Each demon knows 


1Smoluchowski’s origina] trapdoor was a hole surrounded by hairs combed so that they all come 
out on the same side of the partition between the two chambers (rather than a rea] trapdoor). 
Naively, this arrangement of hairs should favor molecules passing in the direction in which the hair 
is combed, and impede the reverse motion. Smoluchowski pointed out that thermal] fluctuations 
will “ruffle the hair” and make this arrangement ineffective as a rectifier of fluctuations when 
the whole system is at the same fixed temperature. Numerical simulations of trapdoors confirm 
these conclusions [5]. They also show why our intuition based on far-from-equilibrium behavior of 
trapdoors can be easily misled. 
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nothing but its own record, and need not care about the other members of “its 
ensemble” that have found out something else in their measurements — it will find 
out its solution to its own problem. 


The ultimate analysis of Maxwell’s demon must involve a definition of intelli- 
gence, a characteristic which has been all too consistently banished from discussions 
of demons carried out by physicists. On the other hand, intelligence has been — 
since Turing and his famous test — often invoked in the discussions of computer 
scientists. To convince ourselves (and the intelligent demon) of the limits imposed 
by the second law we shall, following Ref. [12], adopt an operational definition of 
intelligence which arose in the context of the theory of computation. It is based 
on the so-called Church-Turing thesis [13] — which in effect formalizes Turing’s 
expectations about the “mental” capabilities of computers and states that intelli- 


gence is equivalent to the same kind of information processing that is in principle 
implementable on a universal computer. 


Using the Church—Turing thesis as a point of departure, the present author has 
demonstrated that even this intelligent threat to the second law can be eliminated — 
the original “smart” Maxwell’s demon can be exorcized. This is easiest to establish 
when one recognizes that the net ability of demons to extract useful work from 
systems depends on the sum of measures of two distinct aspects of disorder [12]: 


(i) The usual statistical entropy given by: 
H(p) = -Trplgp (22.1) 


where p is the density matrix of the system, determines the ignorance of the ob- 
server. 


(ii) The algorithmic information content: [14-20] 


K(p) = [pil (22.2) 


is given by the size (“|...|”), in bits, of the shortest algorithm (p*) which, for an “op- 
erating system” of a given Maxwell’s demon, can reproduce the detailed description 
(p) of the state of the system. KA (p) quantifies the cost of storing of the acquired 
information, which is related to the randomness inherent in the state of the system 
revealed by the measurement. 


The Church-Turing thesis enters in this second algorithmic ingredient, as it 
involves an assumption that the intellectual abilities of Maxwell’s demons can be 
regarded as equivalent to those of a universal Turing machine: It is assumed that 
demons can execute programs (such as p)) to reconstruct records of past mea- 
surements out of their optimally compressed versions, or to carry out other logical 
operations in optimizing performance. Algorithmic information content provides a 
well-defined measure of the storage space required to register the known character- 
istics of the system. 
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Physical entropy [12] is the sum of the statistical entropy and of the algorithmic 
information content: 


Z(p) = H(p) + K(p) (22.3) 


Above, it is assumed that the base for the logarithm in Eq. 22.1 is the same as the 
size of the alphabet used by the computer which constitutes the operating system 
of the Maxwell’s demon. In practice, it is customary and convenient to employ a 
binary alphabet, so that both H(p) and K(p) are measured in bits. 


In order to appreciate the physical significance of the algorithmic randomness 
contribution, it is useful to discuss the behavior of H, K and Z in the course of 
measurements and to follow the operations of the engines controlled by demons. 
In short, the two measures turn out to be complementary — not in the quantum 
sense, but a bit like kinetic and potential energy — and their sum is, on the average, 
conserved under optimal measurements carried out on an equilibrium ensemble. 
Analysis which leads to this conclusion was carried out by this author [10, 12] and 
extended by Caves [21]. Below we offer only a brief summary of the salient points. 


In the course of ideal measurement on an equilibrium ensemble the decrease of 
ignorance is, on the average, compensated for by the increase of the size of the 
minimal record [12): 

AH ~—<AK>. (22.4) 


Consequently, physical entropy Z plays a role analogous to a constant of motion. 
The transformation of the state of the system is now, however, brought about 
by a demonical (rather than dynamical) evolution, by the act of acquisition of 
information. This “conservation law” can be demonstrated within the context of 
the algorithmic theory of information [10, 12, 21, 22]. However, its validity can be 
traced to coding theory [12, 21-23]. According to the noiseless coding theorem of 
Shannon [23], the minimal size L of the message required to encode information 
which corresponds to a decrease of entropy by AH is, on the average over all of the 
messages, bounded by: 
AH <£L<AH+1 


This inequality is used in the proof of Eq. 22.4 and is ultimately responsible for the 
constancy of the physical entropy Z in the course of the measurement [12, 21]. 


The role of Z in determining the efficiency of demon-operated engines is the 
ultimate reason for regarding Z as physical entropy. For the total amount of work 
which can be extracted from a physical system in contact with a heat reservoir of 
temperature T in the course of a cycle which involves a measurement (p > p;) and 
isothermal expansion (p; > p) can be made as large as, but no larger than: 


AW = kpT(Z(p) — Z(p:)) (22.5) 


To justify this last assertion, I shall appeal to Landauer’s principle [7] which for- 
malizes earlier remarks of Szilard [6] and states that erasure of one bit of information 
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from the memory carries a thermodynamic price of kgT. Although Landauer’s prin- 
ciple assigns a definite price to the storage of information, this price need not be 
paid right away: A demon with a large unused memory can continue to carry out 
measurements as long as it has room to store information. However, such a demon 
poses no threat to the second law: Its operation is not truly cyclic. In effect, it op- 
erates by employing its initially empty memory as a low temperature (zero entropy) 
heat sink. 


Erasure of the results of used up measurements carries a price tag of 
AW” =T <(K(gi)-—K(p)>, (22.6a) 
which must be subtracted from the gain of useful work 
AW* = T(H(p) — H(i) , (22.66) 


to obtain the net work extracted by the demon. This immediately justifies Eq. 22.5. 
The hybrid Z is the physical entropy which provides the demon with an individual, 
personal measure of the potential for thermodynamic gains due to the information 
in its possession. It also demonstrates that a demon operating on a system in 
thermodynamic equilibrium will never be able to threaten the second law, for the 
ensemble average of Z is at best conserved, so that < AZ >< 0 in course of the 
process of acquisition of information. 


22.3 Physical Entropy and the Demon of Choice 


This last assertion is, however, justified only if the demon is forced to complete each 
measurement-initiated cycle. One can, by contrast, imagine a demon of choice, an 
intelligent and selective version of Maxwell’s demon, who carries out to completion 
only those cycles for which the initial state of the system is sufficiently nonrandom 
(concisely describable, or algorithmically simple) to allow for a brief compressed 
record (small K(p)). This strategy appears to allow the demon to extract a sizeable 
work (AW*) at a small expense (AW). Moreover, if the measurements can be 
reversibly undone, then the ones with disappointing outcomes could be reversed 
at no cost. Such demons would still threaten the second law, even if the threat is 
somewhat more subtle than in the case of Smoluchowski’s trapdoor. 


Caves [22] has considered and partially exorcised such a demon of choice by 
demonstrating that in any case the net gain of work cannot exceed kg7' per mea- 
surement. Thus, the demons would be, at best, limited to exploiting thermal fluctu- 
ations. Moreover, in a comment [24] on Ref. [22] it was noted that taking advantage 
of such fluctuations is not really possible. Here I shall demonstrate that the only 
decision-making process free of inconsistencies necessarily leaves in the observer’s 
(demon’s) memory a “residue” which requires eventual erasure. The least cost of 
erasure of this residue is just enough to restore the validity of the second law. The 
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aim of this paper is to make this argument (first put forward by this author at the 
meeting of the Complexity, Entropy, and the Physics of Computation network of 
the Santa Fe Institute in April of 1990) more carefully and more precisely. 


To focus on a specific example consider Gabor’s engine [25] illustrated in Fig. 22.1. 
There, the unlikely but profitable fluctuation occurs whenever the gas molecule is 
found in the small compartment of the engine. The amount of extractable work is: 


AW; = ksT |g(L/é) (22.7) 
The expense (measured by the used up memory) is only: 
AW” = kpT, (22.8) 
so that the net gain of work per each successful cycle is: 
AW, = kaT (lg(L/2) — 1) (22.9) 
The more likely “uneconomical” cycles would allow a gain of work: 
AWt = kaT Ig L/(L—8) , (22.10) 


so that the cost of memory erasure (still given by Eq. 22.8) outweighs the profit, 
leaving the net gain of work: 


AW, = —kpT(1 —- lg L/(L—2®). (22.11) 


When each measurement is followed by the extraction and erasure routine, the 
averaged net work gain per cycle is negative (i.e., it becomes a loss): 
id L—-é £2, @ L-e 
< AW >= 7 AW, + —— 7 ——A = ~kaT(l+ (7 les + 7 

The break even point occurs for the case of Szilard’s engine [6], where the parti- 
tion divides the container in half. In the opposite limit, 2/L << 1, almost every 
measurement leads to an unsuccessful case which results in a negligible amount of 
extracted work but undiminished cost of erasure per cycle. 


ag) (22.12) 


The design of the demon of choice attempts to capitalize on precisely this other- 
wise unprofitable limit by undoing all of the likely (and unprofitable) measurements 
at no thermodynamic cost, thus avoiding the necessity for erasure of the unused 
outcomes. It is important to emphasize that a measurement of the thermodynamic 
quantities can be indeed undone at no cost: A prejudice that measurement must be 
thermodynamically expensive goes back at least to the ambiguities in the original 
paper of Szilard [6] (who has hinted at, but failed to clearly identify erasure as 
the only thermodynamically expensive part of the measuring process), and was fur- 
ther reinforced by the popular (but incorrect) discussion of Brillouin [26]. Fig. 22.2 
demonstrates how to carry out a measurement on a particle in the Gabor’s engine 
(such measurement becomes reversible when the operations indicated are carried 
out infinitesimally slowly). 
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Piston 


Membrane 
9” 


Fig. 22.1. Gabor’s engine[25]. See text for the standard operating procedure. The decision 
between the two branches (of which only one — the profitable one — is shown) can be 
made reversibly with the help of the device shown in Fig. 22.2. 
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Memory Key 


Gabor's Engine 


on? a bo 


Fig. 22.2. Blueprint of a reversible measuring device for Gabor’s engine. The measure- 
ments can be done (or undone) by turning the crank on the right in the appropriate direc- 
tion and pushing in or pulling out the “scale”. Thermodynamic reversibility is achieved in 
the limit of an infinitesimally slow operation. Faster controlled-not like measurements can 
be carried out on a dynamical timescale by implementing the unitary evolution given by 
Eq. 22.14. The design shown above is similar to the Szilard’s engine contraption devised 
in Ref. (28). 
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22.4 Measurements and Decisions 


The purpose of the measurement is to establish a correlation between the state of the 
system and the record — the state of the few relevant bits of memory. In the context 
of this paper we shall focus on the measurements which correlate memory with a cell 
in the phase space or a subspace of the Hilbert space of the system (corresponding 
to the projection operator P;). In concert with the usual requirements I shall 
demand that the collection {P;} of all the measurements be mutually exclusive 
(T'r(P;, P;) = 0), and exhaustive (X;P; = 1). To avoid problems associated with 
quantum measurements we shall also demand that the measured observables should 
commute with the density matrix of the measured system [P;,ps5] = 0. Thus, 
we shall allow for the best case [9] (from the demon’s point of view), with no 
additional thermodynamic inefficiencies associated with the reduction of the state 
vector introduced into quantum measurement through decoherence (10, 11, 28-31]. 


A measurement performed by the demon, when viewed from the outside, results 
in the correlation between the state of the system (i.e. location of the particle in 
the Gabor’s engine) and the state of the demon’s memory. The total entropy can 
be prevented from increasing, as the only requirement for a successful measurement 
is to convert initial density matrix of the combined system-demon: 


pS), = ps x pS) = (DipiP,) x pls? (22.13a) 


into the correlated [9, 10, 28-31]: 


psp = Sspi(P; x p)) (22.13b) 


Above, we have implicitly assumed that the measurement is exhaustive in the sense 
that the further refinements will reveal uniform probability distribution within the 
partitions defined by P,. This need not be the case — it is straightforward to 
generalize the above formulae to the case when the different memory states of the 
demon are correlated with density matrices of the system. In any case, the entropies 
of p\) and pio) can, in principle, be the same: For, there exists a unitary controlled- 
not - like evolution operator: 


U =P; x (\d; a do| + [do >< d;|) (22.14) 


with |6; > and |6, > defined by pi? = |; >< 501059), providing that ps? corre- 
spond to distinguishable (orthogonal) memory states of the demon — a natural 
requirement for a successful measurement. 


The statistical entropy of the system-demon combination is obviously the same 
before and after measurement, as, by construction of U, H (p\)) =H (p'9)). More- 
over, the measurement is obviously reversible: Applying the unitary evolution op- 
erator, Eq. 22.14, twice, will restore the pre-measurement situation. 
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Insert 
Partition 


Measure 


i al Undo 


Measurement 
Extract 


Fig. 22.3. Decision flowchart for the demon of choice. The branch on the left is profitable 
(and it is followed when the particle is “caught” in the small left chamber, see Fig. 22.1). 
The branch on the right is unprofitable, and as it is explained in the text in more detail, 
the demon of choice cannot be “saved” by reversing only the unprofitable measurements. 


From the viewpoint of the outside observer, the measurement leads to a corre- 
lation between the system and the memory of the demon: The ensemble averaged 
increase of the ignorance about the content of demon’s memory; 


AHp = H(pp) — H (pS) = -Zipilg pi , (22.15) 

(where pp = Trspsp and H(p) = —Trplg p) is compensated for by the increase of 
the mutual information defined as; 

Isp = H(pp) + H(ps) — H(psp), (22.16) 


so that AHp = AlIsp (see Refs. [29] and [33] for the Shannon and algorithmic 
versions of this discussion in somewhat different settings). 


From the viewpoint of the demon the acquired data are definite: The outcome is 
some definite demon state pn) corresponding to the memory state n, and associated 
with the most concise record — increase of the algorithmic information content — 
given by some AK(n) = K (p%) —k (p?). 

The demon of choice would now either; (i) proceed with the expansion, extrac- 
tion and erasure, providing that his estimate of the future gain: 


AW = kpT(AH — AR) =kpTAZ (22.17) 
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was positive, or, alternatively; (ii) undo the measurement at no cost, providing that 
AW <0. An algorithm that attempts to implement this strategy for the case of 
Gabor’s engine is illustrated in Fig. 22.3. To see why this strategy will not work, we 
first note that the demon of choice threatens the second law only if its operation is 
cyclic — that is, it must be possible to implement the algorithm without it coming 
to an inevitable halt. 


There is no need to comment on the left-hand side part of the cycle: it starts 
with the insertion of the partition. Detection of a particle in the left-hand side 
compartment is followed by the expansion of the partition (converted into a pis- 
ton) and results in extraction of AW, Eq. 22.7, of work. Since the partition was 
extracted, the results of the measurement must be erased (to prepare for the next 
measurement) which costs kgT of useful work, so that the gain per useful cycle is 
given by Eq. 22.9. The partition can be now reinserted and the whole cycle can 
start again. 


There is, however, no decision procedure which can implement the goal of the 
right-hand side of the tree. The measurement can be of course undone. The demon 
— after undoing the correlation — no longer knows the location of the molecule 
inside the engine. Unfortunately for the demon, this does not imply that the state 
of the engine has also been undone. Moreover, the demon with empty memory will 
immediately proceed to do what demons with empty memory always do: It will 
measure. This action is an “unconditional reflex” of a demon with an empty mem- 
ory. It is inevitable, as the actions of the demon must be completely determined 
by its internal state, including the state of its memory. (This is the same rule as 
for Turing machines.) But the particle in the Gabor’s engine is still stuck on the 
unprofitable side of the partition. Therefore, when the measurement is repeated, it 
will yield the same disappointing result as before, and the demon will be locked for- 
ever into the measure - unmeasure “two-step” within the same unprofitable branch 
of the cycle by its algorithm, which compels it to repeat two controlled-not like 
actions, Eq. 22.14, which jointly amount to an identity. 


This vicious cycle could be interrupted only if the decision process called for ex- 
traction and reinsertion of the partition before undoing the measurement (and thus 
causing the inevitable immediate re-measurement) in the unprofitable right branch 
of the decision tree. Extraction of the partition before the measurement is undone 
increases the entropy of the gas by kg[lg(L—@)/L] and destroys the correlation with 
the demon’s memory, thus decreasing the mutual information: The molecule now 
occupies the whole volume of the engine. Moreover it occurs with no gain of useful 
work. Consequently, reversibly undoing the measurement after the partition is ex- 
tracted is no longer possible: The location on the decision tree (extracted partition, 
“full” memory) implicitly demonstrates that the measurement has been carried out 
and that it has revealed that the molecule was in the unprofitable compartment — 
it can occurr only in the right hand branch of the tree. 
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The opening of the partition has resulted in a free expansion of the gas, which 
squandered away the correlation between the state of the gas and the state of the 
memory of the demon. Absence of the correlation eliminates the possibility of 
undoing the measurement. Thus, now erasure is the only remaining option. It 
would have to be carried out before the next measurement, and the price of kgT 
per bit would have to be paid [6, 7]. 


One additional strategy should be explored before we conclude this discussion: 
The demon of choice can be assumed to have a large memory tape, so that it can 
put off erasures and temporarily store the results of its A/ measurements. The tape 
would then contain ~ A/.(€— L)/L 0’s (which we shall take to signify an unprofitable 
outcome) and ~ Né/L 1's. In the limit of large MV (MVé@/L >> 1) the algorithmic 
information content of such a “sparse” binary sequence s is given by [14-20]: 


: €, @€@ bL-é, L-eé 


Moreover, a binary string can be, at least in principle, compressed to its minimal 
record (s* such that i (s) = |s*|) by a reversible computation [12]. Hence, it is 
possible to erase the record of the measurements carried out by the demon at a cost 
of no less than 


< AW- >= kgT[K(s)/N]- (22.19) 


Thus, if the erasure is delayed so that the demon can attempt to minimize its 
cost before carrying it out, it can at best break even: The —kgT in Eq. 22.12 is 
substituted by the < AW >, Eq. 22.19, which yields: 


< AW >=< AWt > 4+ < AW >=0. (22.20) 


It is straightforward to generalize this lesson derived on the example of Gabor’s 
engine to other situations. The essential ingredient is the “noncommutativity” of 
the two operations: “undo the measurement” can be reversibly carried out only 
before “extract the partition.” The actions of the demon are, by the assumption of 
the Church—Turing thesis, completely determined by its internal state, especially its 
memory content. Demons are forced to make useless re-measurements. Santayana’s 
famous saying that “those who forget their history are doomed to relive it” applies to 
demons with a vengance! For, when the demon forgets the measurement outcome, it 
will repeat the measurement and remain stuck forever in the unprofitable cycle. One 
could consider more complicated algorithms, with additional bits and instructions 
on when to measure, and so on. The point is, however, that all such strategies must 
ultimately contain explicit or implicit information about the branch on which the 
demon has found itself as a result of the measurement. Erasure of this information 
carries a price which is on the average no less than the “illicit” gains which would 
violate the second law. 
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22.5 Conclusions 


The aim of this paper was to exorcise the demon of choice — a selective version 
of Maxwell’s demon which attempted to capitalize on large thermal fluctuations by 
reversibly undoing all of the measurements which did not reveal the system to be 
sufficiently far from equilibrium. I have demonstrated that a deterministic version 
of such a demon fails, as no decision procedure is capable of both (i) reversibly 
undoing the measurement, and, also, of (ii) opening the partitions inserted prior to 
the measurement to allow for energy extraction following readoff of the outcome. 


Our discussion was phrased — save for an occassional reference to density ma- 
trices, Hilbert spaces, etc. — in a noncommital language, and it is indeed equally 
applicable in the classical and quantum contexts. As was pointed out already some 
time ago [9, 10], the only difference arises in the course of measurements. Quantum 
measurements are typically accompanied by a “reduction of the state vector”. It 
occurs whenever an observer measures observables that are not co-diagonal with the 
density matrix of the system. It is a (near) instantaneous process [34], which is nowa- 
days understood as a consequence of decoherence and einselection [19, 28, 30-34]. 
The implications of this difference are minor from the viewpoint of the threat to 
the second law posed by the demons (although decoherence is paramount for the 
discussion of the interpretation of quantum theory). It was noted already some time 
ago that decoherence (or, more generally, the increase of entropy associated with the 
reduction of the state vector) is not necessary to save the second law [9]. Soon after 
the algorithmic information content entered the discussion of demons [12, 21] it was 
also realised that the additional cost decoherence represents can be conveniently 
quantified using the “deficit” in what this author knew then as the ‘Groenewold- 
Lindblad inequality’ [35, 36], and what is now more often (and equally justifiably) 
called the ‘Holevo quantity [37]; 


x = A(p) — So rit (p?) (22.21) 


which is a measure of the entropy increase due to the “reduction”. The two 
proofs [36, 37] involving essentially the same quantity have appeared almost si- 
multaneously, independently, and were motivated by — at least superficially — 
quite different considerations. 


We shall not repeat these discussions here in detail. There are however several 
independently sufficient reasons not to worry about decoherence in the demonic 
context which deserve a brief review. To begin with, decoherence cannot help 
the demon as it only adds to the “cost of doing business”. And the second law is 
apparently safe even without decoherence [9]. Moreover, especially in the context of 
Szilard’s or Gabor’s engines, decoherence is unlikely to hurt the demon either, since 
the obvious projection operators to use in Eq. 22.14 correspond to the particle being 
on the left (right) of the partition, and are likely to diagonalise the density matrix of 
the system in contact with a typical environment [9] (heat bath). (Superpositions of 
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states corresponding to such obvious measurement outcomes are very Schrodinger 
cat-like, and, therefore, unstable on the decoherence timescale [34].) Last not least, 
even if demon for some odd reason started by measuring some observable which does 
not commute with the density matrix of the system decohering in contact with the 
heat bath environment, it should be able to figure out what’s wrong and learn after 
a while what to measure to minimise the cost of erasure (demons are supposed to 
be intelligent, after all!). 


So decoherence is of secondary importance in assuring validity of the second 
law in the setting involving engines and demons: Entropy cannot decrease already 
without it! But decoherence can (and often will) add to the measurement costs, and 
the cost of decoherence is paid “up front”, during the measurement (and not really 
during the erasure, although there may be an ambiguity there — see a quantum 
calculation of erasure-like process of the consequences of decoherence in Ref. [38]). 
However, in the context of dynamics decoherence is the ultimate cause of entropy 
production, and, thus, the cause of the algorithmic arrow of time [33]. Moreover, 
there are intriguing quantum implication of the interplay of decoherence and (al- 
gorithmic) information that follow: Discussions of the interpretational issues of 
quantum theory are often conducted in a way which implicitly separates the infor- 
mation observers have about the state of the systems in the “rest of the Universe” 
from their own physical state — their identity. Yet, as the above analysis of the 
observer-like demons demonstrates, there can be no information without representa- 
tion. The observer's state (or, for that matter, the state of its memory) determines 
its actions and should be regarded as an ultimate description of its identity. So, to 
end with one more “deep truth” ezistence (of the observers state, and, especially, 
of the state of its memory) precedes the essence (observer’s information, and, hence 
their future actions). 
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(such as my discussion of the decoherence timescale which was eventually published 
as Ref. [34]) the room was filled to perhaps a third of the capacity. However, when I 
walked in in the middle of the afternoon coffee break, well in advance of Feynman’s 
talk, the room was already nearly full, and the air was thick with anticipation. A 
moment after I sat down in one of the few empty seats, I saw Feynman come in, 
and quietly take a seat somewhere in the midst of the audience. More people came 
in, including the organisers and the session chairman. The scheduled time of his 
talk came... and went. It was five minutes after. Ten minutes. Quarter of an hour. 
The chairman was nervous. I did not understand what was going on — I clearly 


saw Feynman’s long grey hair and an occasional flash of an impish smile a few rows 
ahead. 


Then it struck me: He was just being “a curious character” , curious about what 
will happen... He did what he had promised — showed up for his talk on (or even 
before) time, and now he was going to see how the events unfold. 


In the end I did the responsible thing: After a few more minutes I pointed out 
the speaker to the session chairman (who was greatly relieved, and who immediately 
and reverently led him to the speaker’s podium). The talk (with the content, more 
or less, of Ref. [39]) started only moderately behind the schedule. And I was 
immediately sorry that I did not play along a while longer — I felt as if I had given 
away a high-school prank before it was fully consummated! 
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in quantum computers, 180, 193 and computation, ix, 3 et seq, 45, 117 

round-off and truncation, 51, 118, 269, 347 and Connection machine, 257 

in transistor, 232 and credit, xii, 4 


Error-correcting codes, 46 and electrodynamics, 22 et seq 
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as explainer, 263 

and functioning of brain, 5 

and logarithm algorithm, 260 

mythology surrounding, 25 

prizes, miniaturization, x, 75 

and problem solving, xxi, 264 

“proof by intimidation”, xxi, 25 

wake, 27 
Feynman Lectures on Computation, ix, xix 
Feynman Lectures on Gravitation, 22, 25, 26 
Feynman Lectures on Physics, 22, 25, 120 

and electrodynamics, 22 

motivation for, 28 

as poems, 393 
Field, 22, 125 

analogue, in CA, 122 

electric, 22, 30 

electric, discretizing, 152 

electric, in gate oxide, MOS transistor, 94 

electric, maximum, in drain junction, 101 

existence, 310 

forces in, 126 

intensity, and rest mass, 127 

magnetic, 22, 30, 311 

magnetic, “reality” of, 23 

measurement, and reality, 316 

potential, on a charged particle, 125 

potential, and state change rules, CA, 125 

programmable gate array, 291 

quantum, and molecular stability, 120 

spherical force, cellular array, 125 

theory, cellular automaton model as, 269 

theory, gauge, 311 

theory, and solid state phenomena, 140 
Finite state machines, 263 
Fitzhugh-Nagumo reaction-diffusion 

dynamics, 290 

Floating point, 260, 261, 262, 264 293 
Fluids 

fast flow, 289 

immiscible, 289 

modelling, CA, 262, 285 et seq 

multiphase, 290 


vortex patterns, in flow, 7 
vortex shedding, 288 
Flux quantum (fluxoid), 23, 31 et seq 
magnetic, 31 et seq, 311, 316 
multi-turn coil, 39 
quantization of, 33 
Fokker-Planck equations, 265 
Forced categorizer, 16 
Fourier transform 
classical, 198 
Quantum Fourier Transform (QFT), 
see Quantum Fourier Transform 
Fractals 
and fashions in science, 87 
table structure, chain computer, 385 
Fredkin gate, xix, 119, 234, 338 
Free will, 121, 150, 152 
Function 
and computation capacity, 356 
evaluation, in quantum computer, 196 
Green’s, 29, 36 
period, 198, 203 
Fungibility, 349 et seq 


Gabor’s engine, 399 
Game of Life, xvii, 122, 275, 297 
and Connection Machine, 261 
glider guns in cellular automaton, 295 
non-invertibility of, 276 
Ganglia, 13 
Gas, 284 
HPP, 286 
lattice, 358 
lattice gas automaton, 286, 293, 386 
photon, 289 
refraction, 386 
TM, 287 
waves in, 286 
Gate 
array, 291 
cyclic and acyclic, 158 
Fredkin, xix, 119, 234, 338 
logic, 178 
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logic, and conservation of information, 119 


logic, Shor’s algorithm, 204 

quantum, 178, 193 

quantum, networks, 155, 158 

reversibility of, 233 

in space-time, 364, 368 

time, ion trap computer, 211 

two-qubit CNOT, 185 
Gate-oxide, in transistor 

capacitance, 95 

thickness, 94 


General Number Field Sieve algorithm, 203 


Geometrodynamics, 311, 315 
Gestalt, 12, 17 
“Ghotons”, 126 
Gliders 
and arbitrary logic circuits, 276 
in Critters, 279 
in Game of Life, 276 
guns, 276, 295 
lifetime of, and invertibility, 277, 279 
Go, 238 
Godel theorems, 155 
Graded junction approximation, 101 
Grammar, 151 
Graph isomorphism, 179 
Greatest Common Divisor, 202, 375 
Green’s function, 29, 36 
Groenewold-Lindblad inequality, 406 
Group, Abelian and non-Abelian, 201 
Grover’s database search algorithm, 
159, 192, 201 
optimality of, 201 
Halting problem, 394 
Hadamard operation, 198, 200 
“Hadrian’s Palace”, 243 
Hamilton’s principle of least action, 
30, 349 et seq 
and combinatorics, 391 
Hamiltonian, 141, 180, 194, 380 
behaviour, departure from, 86 
of crystal lattice, 341 
Feynman’s prescription, 4, 156 


isolated qubit, 194 
of lumped string, 387 
systems, Lyapunov exponent, 48 


task, in quantum robot, 155, 160, 162, 168 


transitions, ion trap computer, 208 
two-qubit gate, 185 

Hamming distance, 13, 16 

Hanning windows, 56 


Hardy, Pazzis and Pomeau (HPP) gas, 286, 292 


diffusion model, 287 

Heat bath, 270 
and cellular automata, 273, 290 
and economic modeling, 246 


equilibrium, and particle acceleration, 394 


invertibility of, CA, 273 

Heat death, 281 

Heisenberg equations, 217 

Hidden variables, 142, 148, 268 

High Performance Fortran, 241 

Hilbert space, 79, 166, 179, 180, 317 
projections in, and measurement, 196 
of qubit, 194 
two-dimensional, 177 

History 
in chain computer, 382, 389 
of computational limits, x, 78 et seq 
recording, by quantum robot, 165 
Santayana and, 405 

Holevo quantity, 406 

Hollerith’s census counter, 78 

Hot-clocking, 79, 230 

Hughes Research Laboratories, 22 

Hydrodynamic flow, 288, 293 
two-dimensional, 35, 38 


Index of refraction, 387 
Inductance, 31 et seq, 228 
of loop of wire, 35, 40 
mutual, 35, 38 
Inertia 
of classical vs collective systems, 39 
electrodynamic, 37 
of lumped string, 385 
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Information statistical, 284 
and causality, 136 Insulating substrates, 109 
classical, 181, 268 Integration of outer planet orbits, 
conservation of, 119, 129, 350 47, 48, 50 et seq 
density of, 119 Intel, xii 
dynamics, 241 Interaction 
encrypted, 188, 201 laser-ion, 207 
and energy dissipation, 78 primitive, in communication, 185 
and entropy, 78, 297, 313, 349, 371 representation, 194 
erasure of, 78, 395, 397 et seq Interconnect, 111, 240 
flow in time, 150 programmed, 291 
fungibility of, 350 Interdisciplinary research, 241, 242, 249 
locally finite, theories, 118 Interference, quantum, 192, 197, 200 
loss, via noise, 80 quantum algorithms, and, 200 
mechanics, of photons, 124 and Shor’s theorem, 198 
mutual, 403 Internetics, xvii, 241 et seq 
physical nature of, 77 et seq academic implementation of, 253 
processing, finite, 297 defined, 242 
quantity in volume of space-time, and physics, 242 
85, 119, 122, 262 Invertible dynamics, 267, 271 et seq, 356, 372 
quantum, 88, 177 et seq, 191 et seq and excitation lifetime, CA, 277 
Shannon, 15, 78 Game of Life, 276 
on small scale, 66 pseudo-randomness, 284, 287 
storage algorithm, in neural net, 10 Invariance, relativistic 
storage efficiency, 15 macroscopic, LGA models, 296 
theory see Information theory and momentum conservation, 296 
transfer, in Feynman computer, 237 observable scale, CA, 262 
Information theory, 46, 78, 181, 309 et seq, 322 Ion 
algorithmic, 397 Beryllium, quantum computer, 193 
finite, 118 calcium, 206 
quantum, see Quantum Information theory centre of mass motion, 207 
Zurek’s triangle inequality, 322 fluorescence, and bit value, 
Input /output ion trap computer, 208 
and data carry-through, and miniature writing, 64 
quantum computer, 196 as quantum register, 207 
general computer, 352 spacing, ion trap computer, 207 
as “measurement”, 79 trap computer, see Ion trap computer 
neuron system, 11 Ion trap quantum computation, 
quantum computer, 196, 205, 236 157, 191, 205 et seq 
quantum gate, 178 Cirac-Zoller scheme, 193 
and quantum jumps, 208 decoherence in, 211 
quantum robot, 158, 162, 168 limitations of, 191, 211 et seq 


relativistic program, 364 logic gates in, 193, 208 et seq 
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output, 208 
vs NMR computer, 215 
Irrational numbers, 314 
Ising Model, 8, 13, 122, 270, 297 
Bennett’s 1-D, 272 
bond energy, 270 
cellular automaton, 270 et seq 
equilibrium, 271 
neural net, 8, 13 
parallel update in, 271, 272 
phase change behaviour, 270 
three-dimensional, with heat bath, 273 
“It from Bit”, xviii, 309 et seq 
and consciousness, 320 
examples, 311 
“reality is theory”, 318 


Jacobian, 49 

Jacquard loom, 78 

Java, 241, 253 
and parallelism, 248 

Jayne’s law, 376 

Josephson junction, 83 

Jupiter 
energy error, orbital integration, 52 
longitude error, orbital integration, 51, 52 


Key, cryptographic, 201 
see also Cryptography, RSA 
quantum, 188 
Itirkwood gap, 57 


Lagrangian, 369, 371, 380 
lumped string, refraction, 386 
mechanics, 217 
Lamb-Dicke parameter, 209 
Landauer’s principle, 395, 397 
Laser 
cooling, 207 
intensity, 210 
intensity, and decoherence, 212 
interaction with ions, qubit state-change, 
208, 210 


ion trap computer, 207 
phase stability, 211 
Lattice 
anisotropies in, 124 
body-centred cubic, 381 
closed, 158 
crystal, Hamiltonian, 341 
dynamics, 267 
gas automaton, 286, 358 
hexagonal, 262 
isotropic propagation in, 124, 262 
liquid, 124 
of qubits, 156 
random, 262 
space-time as, 134, 260, 367 
spin, probabilities in, 143 
sublattices, 270, 298 
three-dimensional Ising, 273 
two-dimensional Ising, 270 
see also Cellular array, automata 
Law of large numbers, 391 
Legendre’s congruence, 203 
Libration, 47 
Light cone, 364 
Limit points, neural net, 8, 12 
Liouville’s theorem, 350 
Lithium, 134 
Locality 
finite information theories, 118 
of interaction, in lattices, 267, 270 
of laws of Nature, 138, 269, 291, 296 
in probabilistic computer, 133, 138, 148 
head positioning, quantum robot, 173 
operators in quantum robot, 156, 160, 163 
state rules, cellular vacuum, 122 
Logic 
array, programmed logic, 46 
circuits, gain of, 108 
circuits, glider guns, Game of Life, 276 
and clocking, 84 
CMOS digital, 105 
conservative, 238, 338 
crystalline arrays of, 267 
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see also Cellular automata 
gate time, ion trap computer, 211 
operations, and arithmetic, 196 
particles in potential well, 83 
quantum, 85 
speed, in MOS technology, 105 
see also Gates 
Lorentz 
in discrete space-time, 262 
invariance, 124, 296, 364, 366 
transformation, and transliteration, 369 
Lubrication, and miniature machines, 71 
Lumped string, 383 et seq 
Hamiltonian, 387 
inertia of, 385 
Lagrangian, 386 
refraction, 386 
Lyapunov exponents, 48, 53 et seq 
of chaotic trajectories, 48, 55 
estimation of, 49, 53 
Hamiltonian systems, 48 
of Pluto, 53 
quasiperiodic trajectories, 49 


Macroscopic coarse graining, 246 
Macroscopic quantum tunneling, 81 
MACSYMA, xviii 
Magnetic 

field, 22, 30, 311 

field, “reality” of, 23 

flux, 31, 33 et seq 

properties, and miniaturization, 70 
Manhattan Project, 259 
Many-worlds interpretation, 150 
Majority logic, 84, 85 
Marchant calculator, 117 
Master-slave system, 71, 72 
Mathematica, 344, 350 
Mathematics, 26, 67, 77, 85, 117 

accuracy of, 85 

continuum in, 314 

limitations of, 155 

theorems in, 322 


Matter, and quantum theory, 32, 36 
Maximal speed 
in CA, 122 
in LGA, 230 
see also Speed of light 
Maxwell’s Demon, xiii, xix, 22, 77, 393 et seq 
intelligence of, 395 
quantum mechanical], 157 
Szilard’s analysis, 77 
Maxwell’s Equations, 22, 26, 29 et seq 
etheric interpretation, 32 
“Feynman’s proof” of, 217 
Newton’s force law and, 37 
McLellan Machine, x, xi, xii, xiv 
Mean free path, 230 
Measurement, 151, 311, 316, 318, 393 et seq 
as system-recorder correlation, 393, 402 
and energy, 79 
as projections in Hilbert space, 196 
and Quantum Fourier Transform, 199 
quantum, and laws of physics, 86, 311, 316 
and quantum robot, 157, 158 
readout, quantum computer, 196 
thermodynamic cost of, 395 et seq 
see also Observer-participancy 
Memory 
ambiguities, neura] net, 17 
associative, 4, 259 
bistable elements, 80, 81 
in CA machines, 292 
content-addressable, neural networks, 
8 et seq 
dynamic random access, 80, 294 
erasure, cost of, Gabor engine, 399 
forgetting, in neural net, 1 
limitations, in quantum computer, 211 
in parallel computers, 226 
and quantum laws, 121 
quantum robot, 156, 161, 165, 167 
redundancy in, 80 
ROM, programrnable, 353 
ROM, and space-time, 382 
running and permanent, 168 
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Shor’s algorithm, 204 
silicon-based content-addressable, 15 
stability of, 14, 85 
Swanson’s element theory, xii, 80 
in thermodynamic demons, 398, 402 
time-sequential, 8 
unlimited, 85 
Von Neumann architecture, 226 
Message-Passing Interface (MPI), 241, 247 
Metal Oxide Semiconductor (MOS) 
technology, 45 
future developments in, 93 et seq 
“Microcosm”, 25 
Miniaturization, 63 et seq 
computers, 68 
by evaporation, 69 
MOS technology, 93 et seq 
writing, 63 
see also Scaling 
Modulus, 202 
factoring, 204 et seq 
Molecular Dynamics, discrete, 284 et seq 
and fluid modeling, 285 
Momentum 
of collective system, 37 
conserving collision, 
Molecular Dynamics, 285 
electrodynamic, 36, 39 
of electron, 32 
exchange, cellular array, 126 
force and, 37 
particulate, in cellular automata, 126 
particle, simple computer, 378 
quantization of, 127 
Monte Carlo simulations 
ferromagnetism, 237 
neural net 13 
Moore’s law, xii, 93 
Multimedia, 254 
Murphy’s law, 24, 376 


NAND gate, 233 
Nanotechnology, 15, 120 


machine miniaturization, 70 et seq 
National Science Foundation, 249 
Neptune, 47, 56, 57 

and Pluto, 47 

power spectrum, 56 
Neural networks, x, xvii, 7 et seq, 259 

and Connection Machine, 259 

and integrated circuits, 7, 18 

efficiency of storage, 15 

linear associative, 10 

biological interpretation, 10 

“familiarity” in, 16 

“forgetting” in, 16 

and Perceptron, 9 

synchronous processing in, 12 
Neuroanatomy, 8, 10, 18, 25 
Neuron, 7 

firing rate, 11 

in neural nets, 9 et seq 

states of, 9 

strength of connection, 9, 259 
Newton’s laws, 36, 37, 42, 48, 120, 123 
No-cloning theorem, 159, 183, 187, 215, 311 
Noise 

presence in digital machines, 4 

Gaussian description, 16 

and memory stability, neural net, 14 

“noisy” computer, 352 

thermal, and information loss, 80 
Non-locality 

CNOTSs, in quantum teleportation, 185 

particles, 126 
Non-separability, quantum theory, 297 

in quantum robot, 157 
NOT 

gate, 178, 233 

operation, 195 

reversibility, 195 
NP problems, 179 
Nuclear magnetic resonance, 

quantum computer, 206, 215 

Number operator, 141 
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Observer-participancy, 309, 313 et seq uptake of, 241 et seq 
and making of meaning, 318 Parity, 279 
Operations Particle 
arithmetic, from logic, 196 charged, in potential field, 125 
controlled-sign-flip, 210 collision, BBMC, 282, 369 
logic, ion trap computer, 209, 213 and computation capacity, 369 
logic, CA, 269 conservation, in CA, 278, 282, 358 
non-Boolean unitary, 197 creation, and radiation, 128 
and Quantum Fourier Transform, 198, 200 density matrix for, 142 
on qubit, ion trap computer, 208 diffusing, 137 
Operators elementary, acceleration of, 394 
annihilation, 141 in electromagnetic field, 30, 37 
annihilation and creation, existence, 310 
in ion trap computer, 209 free, simple computer, 376 
density, regional environment, massive, 127 
quantum robot, 159 measurement on, Gabor engine, 399 
number, 141 non-local, 126 
step, distinct path generating, 170 in potential well, as logic element, 83 
step, in quantum robot, 162 quantum, non-relativistic, 
step, locality conditions on, 165, 185 and Maxwell’s equations, 217 
Packet, in cellular array, 123 spinless, in quantum robot, 156, 166 
Pantograph, 72 with spin, classical, 270 
Paradoxes Partitioning 
EPR, 145 in cellular automata, 277, 285, 287 
of quantum theory, 151, 177 in “computer”, 352 
Parallel processing, xv et seq, 226, entropy of, 357 
241 et seq, 257 et seq and hardware, 293 
asynchronous, 8, 18, 242 site, 287, 293 
“barrier to change”, 245 and sublattices, 277 
BASIC and, 261 Path integrals, in quantum physics, 21, 260, 371 
databases, 248 Pauli matrices, 141 
and integer factoring, 203 Perceptron, 9 
porting sequential code, 247 Permanent, of matrix, 179 
quantum computer, 159, 200 Permeability of free space, 36 
tasks in, 243 Phase 


update, Ising model, 270 

see also Connection Machine 
Parallelism 

“automatic”, 247 

and domain decomposition, 243 

massive, 109, 244, 252 


quantum, 78, 86, 87, 192, 200, 295 


accumulation, 33, 35, 39, 311 

and action, 371 

and scaling, collective systems, 37 
space, see Phase space 

in superconducting wire, 41 


transitions, in complex systems, 246, 248 
problems with, xv, 241, 247 et seq Phase space 


attractors in, 15 
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configuration, work extracted from, 395 
e-folding time of divergence, 47, 53, 57 
flow, and content-addressable memory, 7 
Hamiltonian, 371 

Lyapunov exponents, 48 

trajectories in, 48 et seq, 371 et seq, 389 
two-particle, 53 
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physical procedures as instructions, 
171, 267, 363 
relativistic invariance of, 295 
simulating with computers, 
133 et seq, 268, 275 
“theory of everything”, 118 
see also Classical physics 


Photons Pigeon-hole principle, 354 


and base-1 representations, 124 Planck 

beams, ordinary and extraordinary, 146 “area”, 312 

in cellular array, 124 constant, 32, 313, 370, 394 
in cryptography, 188 length, 315 


and data compression, 183 
emission, ion trap computer, 206, 211 
“exchange”, 126 
gas, 289 
from H-transition, 147 
information accessible from, 151, 194 
measurement on, 311, 319 
momentum, and wave number, 313 
non-linear interactions 

and low energy channel, 83 
red shift of, 125 
two-photon correlation experiment, 147 
polarization, 145, 177, 311 


Physics 


and action, 349 et seq 

and chemistry, 67 

of computation, 77 et seq 

continuum in, 314, 316 

conflicts with psychology and religion, 121 
discrete, reversible systems, 337 
foundations of, 29 

as information, 309 et seq 

and internetics, 242 

and language of information, 313 

laws of, 75, 86, 118, 133, 152, 241, 275, 314 
laws of, limits on information handling, 78 
laws, reversibility of, 134 

laws of, and decoherence, 211 
level-upon-level structure, 322 

and locality, 138, 269, 291, 296 
microscopic invertibility, CAs, 270, 294, 350 


Planetary orbits, integration of motion of, 


46, 47 et seq, 363 
1 million year, 50 
3 million year, 51 
200 million year, 48, 50 
845 million year, 47, 50 
outermost planets, 47 
power spectra, 59 
sources of error in, 50 
step size, 51, 59 
classical orbits, 120 
see also individual planets, Solar system 


Pluto, 47 et seq 


“IKXepler Plutos”, 54 

longitude of perihelion, 48, 55 
Lyapunov exponent of, 53 

and Neptune, 47 

orbital elements, 47, 48, 55 et seq, 120 
power spectrum, 56 


Positrons, 136 
Potential 


axial, ion trap computer, 207, 212 
field, in CA, 125 

non-invertible, 290 

surface, in MOS transistor, 94, 103 
valleys, and memory, 81 

scalar, 21, 22 

vector, 21, 22, 23, 29, 34 et seq 
vector, in superconducting wire, 40 
time-modulated, 77, 83 et seq 


Power spectra, 55, 56 
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broadband components, 56 quantum, 121 
of chaotic trajectory, 56 Public key, 188, 201, 203 see also Cryptography 
of quasiperiodic trajectory, 55 Punch through, 94, 100, 104 
Precision and drain-induced barrier lowering, 102 
arbitrary, and unlimited memory, 85 and substrate doping, 104 
arbitray Hamiltonians, construction, 4 
in cellular automata, 124 Quantization, 32, 33, 127 
and classical behaviour, 86 and discretizing, 152 see also Flux 
effects on science of limitations, 86 Quantum bits (qubits), 
and miniaturization, 72 82, 86, 160, 172, 177, 191 
and origin of quantum phenomena, 118 addressing, ion trap computer, 207, 211 
in physical measurement, 135 addressing, technical limits on, 211 
threshold, and unlimited quantum “bus”, in ion trap computer, 208 
computation, 193 and classical bits, 184 
Probabilistic systems, simulating, 136 control, 12 
Probability general state of, 196 
Bayesian, vs frequency interpretation, 318 ion trap computer, 206, 208, 210 
as human concept, 317 lattice of, 158 
mathematical properties, 142 “Raman”, 210 
negative, 144, 148 schemes, for quantum computation, 206, 210 
origin, in quantum theory, 150 “Quantum certainty”, 121 
paradoxical, 151 Quantum Chromodynamics (QCD), 
simulating, 136 xvii, 118, 260, 262 
in spin model, 143 Feynman’s program, 261, 264 
successful computation, Quantum computation, 79, 119, 139 et seq, 
ion trap computer, 213 155 et seq, 179 et seq, 191 et seq 
transition, probabilistic computer, 139 basic concepts of, 194 et seq 
Program, 351, 364, 375 experimental, 191, 192, 205 et seq 
automatic debugging, 225 function evaluation in, 196, 198 
chain computer, 381 gates in, 195 
high-level languages, 45 general model of, 192 
interconnections, gate array, 291 hardware, 205 et seq 
relativistic, 366 ion trap, see Ion trap computer 
relativistic, software and hardware, 366 and cryptography, 181, 192, 201, 216 
statistical, 352 result of, 196 
symbolic computation, 117 “vacuum tube” era of, 217 
Programmed Logic Array, 46 Quantum data compression, 183 
Propagation vector, 32 Quantum dots, 206 
in superconducting wire, 41 Quantum Encoder and decoder, 183 
Proton, mass of, 260 Quantum Electrodynamics (QED), 117 
Psychology cavity, 206 
and Feynman, 117 Quantum field theory, 158 


and functioning of computers, 151 Quantum Fourier transform, 198, 204 
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efficient, over Abelian groups, 201 
four qubit, 199 
“shift invariance” of, 200 
Quantum information theory, 177 et seq 
transmission of, 181 
Quantum logic, 85 
Quantum psychology, 121 
Quantum robot, xiii, 155 et seq 
components, 156 
input/output, 162 
models of, 161 
tasks, see Tasks 
ballast motion in, 165 
Quantum spin models, 284, 297 
Quantum systems, 
average energy of, 370 
awareness of, 156, 170 
collective, 39 
and conventional computation, 196 
discrete, 298 
entropy of, 268 
equilibrium properties, 268 
intelligent, 156, 171 


macroscopic, and electrodynamics, 29 et seq 


non-seperability of, 297 


simulation by classical systems, 46, 142, 150 


see also Quantum Theory 
Quantum teleportation, 183, 184 et seq 
irreversibility of, 184 
Quantum theory, 30, 32, 42, 74, 136, 
142, 309 et seq 
and classical information, 268 
complementarity, 309, 317 
EPR experiment, 145 
and everyday world, 177 
and existence, 309 et seq 
failure of, 217 
and limits on computing, 225 et seq 
path integrals in, 21, 371 
reversibility, 77, 235 
self-validation of, xiii, 155 
simulation by computers, 133 et seq, 
268, 275, 295 


and “thermodynamic demonology”, 395 


and uncertainty, 119 

universality of, 155 

see also Decoherence, Entanglement, 
Hilbert space, Physics, 


Quantum computation, Quantum systems, 


Uncertainty, Wave function 
Quark, 260 
as information storage unit, 236 
Quasiperiodic behaviour, 48, 49, 55 
Quasistatic approximation, 30 
Qubit, see Quantum bit 


Rabi 
frequency, 209 
oscillations, 208 
Radiation, 128 
Raman 
qubits, 210, 214 
transitions, 208, 213 
transitions, and decoherence, 213 
Random 
number generator, 378 
walk, 361, 379 
Randomness, algorithmic, 393 et seq 
Real number system, 85, 178 
Redundancy, 80, 183 
Refraction, of lumped string, 386 
Registers 
input, in quantum computer, 196 
ion trap quantum computer, 157, 207 
NMR computer, 215 
quantum, L-qubit, 191, 195, 197, 198 
quantum, time evolution of, 195 
qubit, quantum robot, 161 
reversible, 85 
Shor’s algorithm, 204 
shift, 79, 85 
Relativity 
special, 124, 295, 313, 364 et seq 
special, and microscopic hardware, 367 
general, 364 
“twin paradox” in, xx 
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see also Lorentz 
Renormalization, 122 
Rest mass, 127 
Reverse polish notation, 338 
Reversibility 
of natural laws, 134, 235 
in quantum robot, 166 
of quantum teleportation, 184 
of state rules, 128 
of tasks, see Tasks 
see also Reversible 
Reversible 
computation, 4, 46, 79, 80, 82, 119, 
134, 235, 239, 337 et seq 
difference Schrodinger equation, 342 
function evaluation, quantum computer, 196 
logic gates, 233 
measurement, 393, 399 
microscopic, 241 
machines, and speed of computation, 239 
Reynold’s number, 289, 293 
Ring oscillators, 105 
Robots, quantum, see quantum robots 
Rodgers Commission, 26 
RSA 
cryptosystem, 201 et seq 
factorization, in quantum computer, 201 
problem, 202 
RSA129, 201 
RSA130, 203 


Scaling, 23, 24 
momentum with charge, 
collective system, 37 
in neural networks, 14 
interaction, in collective system, 39 
of features, in MOS technology, 93 et seq 
laws, 94, 96 
threshold, 102 
SCHEME, xxii, 45 
Schrodinger’s cat, 407 
Schrodinger equation, 139, 170, 194, 317 
assymetric difference equation, 341 


reversible difference equation, 337 et seq 
time reversal symmetry of, 195 
Science 
effects on due to limited precision, 8 
fashions in, xii, 77, 82, 87 
Shannon’s theorems, 4, 351 
noiseless coding, 397 
Shor’s algorithm, 159, 192, 200, 201 et seq 
Sideband cooling, 208, 216 
Simon’s algorithm, 192, 200 
Skin depth, 41, 74 
Solar system 
age of, 47, 57 
as calculator, 237 
and parallel computing, 243 
stability, 47, 50, 58 
Sound 
barrier, 231 
supersonic flow simulation in LGAs, 289 
waves, in HPP gas, 286 
Space-time, 135, 315, 321 
“cells” in, 122, 364 
crystalline, 122 
density of gates in, 368 
dimensionality, 296 
discrete, 119, 126, 134, 161, 262, 269 
events and gates, cellular automata, 364 
information content, 85, 262 
inseperability of space and time, 119 
interconnection of points in, 136 
at microscopic level, 309 
metric, 367 
as ROM, 382 
simulation of, 135 et seq 
topology of, 296 
volume, 368 
worldline, 365, 378 
Specific ergodicity, 356 
lattice gases, 358 
Speed of light, 24, 69, 123, 125, 313, 364 
and limited word length, 119 
and “abbreviation” in CA, 127 
anisotropic, in lattice, 134 
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finite, 296 

Speed-up 
by quantum computers, 155, 159, 

177, 179, 192, 197 268 

parallel computing, 227 
quadratic and exponential, 179, 201 
square root, 201 

Spin 
classical, 270 
interacting, 144 
lattice, 140, 141 
lattice, simulation of, 268 
one-half, 141, 162, 194, 270 
quantization of, 127 
system, 268, 270 
waves, 140 

Stability, 17 
limit points, in neural network, 9, 12 
of memory storage, 80 
of classical molecules, 120 
of memory, neural net, 14, 16 
escape rate from metastable state, 81 
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